Anthropic-HH-RLHF Dataset

The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.

BibTex: