3 datasets found

Groups: Preference Optimization Organizations: No Organization Formats: JSON

Filter Results
  • Ultrafeedback

    The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.
  • Anthropic’s Helpfulness and Harmlessness

    The Anthropic’s Helpfulness and Harmlessness datasets are used for preference optimization, which consists of a set of instructions and their corresponding responses.
  • AlpacaFarm

    The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.