-
Do-Not-Answer dataset
The Do-Not-Answer dataset is designed to test the safety performance of Large Language Models (LLMs). -
HH-RLHF dataset
The HH-RLHF dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).