Dataset Groups Activity Stream Toxic-DPO Dataset The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback. BibTex: @dataset{Unalignment_2024, abstract = {The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.}, author = {Unalignment}, doi = {10.57702/eflhjtjl}, institution = {No Organization}, keyword = {'Human Feedback', 'RLHF', 'Reinforcement Learning'}, month = {dec}, publisher = {TIB}, title = {Toxic-DPO Dataset}, url = {https://service.tib.eu/ldmservice/dataset/toxic-dpo-dataset}, year = {2024} }