You're currently viewing an old version of this dataset. To see the current version, click here.

Anthropic-HH-RLHF Dataset

The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.

Data and Resources

Cite this as

Anthropic (2024). Dataset: Anthropic-HH-RLHF Dataset. https://doi.org/10.57702/vwhenrmb

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Author Anthropic
Homepage https://huggingface.co/datasets/Anthropic/hh-rlhf