1 dataset found

Tags: preference alignment

Filter Results
  • Ultrafeedback

    The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.
You can also access this registry using the API (see API Docs).