-
A general theoretical paradigm to understand learning from human preferences
The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF. -
PKU-SafeRLHF dataset
The dataset used in the paper is the PKU-SafeRLHF dataset.