1 dataset found

Tags: sDPO

Filter Results
  • Ultrafeedback

    The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.
You can also access this registry using the API (see API Docs).