BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

The dataset used in the paper to evaluate the effectiveness of the BEEAR method in mitigating safety backdoors in instruction-tuned LLMs.

Data and Resources

Cite this as

Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia (2025). Dataset: BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models. https://doi.org/10.57702/x96m7qf6

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2406.17092
Author Yi Zeng
More Authors
Weiyu Sun
Tran Ngoc Huynh
Dawn Song
Bo Li
Ruoxi Jia