Dense Reward for Free in RLHF

The dataset used in the paper is not explicitly described, but it is mentioned that it is a preference dataset for language models.

BibTex: