BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

doi:doi:10.57702/x96m7qf6

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

The dataset used in the paper to evaluate the effectiveness of the BEEAR method in mitigating safety backdoors in instruction-tuned LLMs.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia (2025). Dataset: BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models. https://doi.org/10.57702/x96m7qf6

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.2406.17092
Author	Yi Zeng
More Authors	Weiyu Sun Tran Ngoc Huynh Dawn Song Bo Li Ruoxi Jia