HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.

BibTex: