Preference Optimization - Groups

Ultrafeedback

The dataset used in the paper is Ultrafeedback, which is a preference dataset that contains 63k preference pairs sampled from models other than the SFT model.

Dataset
JSON

Anthropic’s Helpfulness and Harmlessness

The Anthropic’s Helpfulness and Harmlessness datasets are used for preference optimization, which consists of a set of instructions and their corresponding responses.

Dataset
JSON

AlpacaFarm

The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.

Dataset
JSON

3 datasets found

Ultrafeedback

Anthropic’s Helpfulness and Harmlessness

AlpacaFarm