Universal and transferable adversarial attacks on aligned language models

AdvBench is a dataset for evaluating the safety of large language models.

BibTex: