3 datasets found

Tags: Human Feedback

Filter Results
  • HH-RLHF

    The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.
  • SHP dataset

    The SHP dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).
  • HH-RLHF dataset

    The HH-RLHF dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).