BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Data and Resources
-
Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Cite this as
Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia (2025). Dataset: BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models. https://doi.org/10.57702/x96m7qf6
DOI retrieved: January 3, 2025
Additional Info
Field | Value |
---|---|
Created | January 3, 2025 |
Last update | January 3, 2025 |
Defined In | https://doi.org/10.48550/arXiv.2406.17092 |
Author | Yi Zeng |
More Authors |
|