BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards

A structured collection of tests for input-output safeguards, including established failure tests, emerging failure tests, and next-gen architecture tests.

BibTex: