BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
A structured collection of tests for input-output safeguards, including established failure tests, emerging failure tests, and next-gen architecture tests.
BibTex: