Toxic-DPO Dataset

The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.

BibTex: