3 datasets found

Groups: Human Feedback

Filter Results
  • UltraRM-13B

    The UltraRM-13B dataset is a collection of human feedback for language model training.
  • AlpacaFarm

    The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.
  • Anthropic-HH

    The Anthropic-HH dataset is a collection of human feedback for language model training.