Dataset - LDM

Ultrafeedback

The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.
- Dataset
- JSON
A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found