1 dataset found

Groups: Programming Tags: Human Feedback

Filter Results
  • APPS

    The dataset used in the paper for training and testing the DPO and PPO models.
You can also access this registry using the API (see API Docs).