Anthropic red-team dataset

The Anthropic red-team dataset is a significant open-access dataset aimed at improving AI safety through training preference models and assessing their safety.

Data and Resources

Cite this as

Bahareh Harandizadeh, Abel Salinas, Fred Morstatter (2024). Dataset: Anthropic red-team dataset. https://doi.org/10.57702/bup1brhp

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2403.14988
Author Bahareh Harandizadeh
More Authors
Abel Salinas
Fred Morstatter