Ultrafeedback

The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.

BibTex: