Anthropic red-team dataset

The Anthropic red-team dataset is a significant open-access dataset aimed at improving AI safety through training preference models and assessing their safety.

BibTex: