-
UltraRM-13B
The UltraRM-13B dataset is a collection of human feedback for language model training. -
AlpacaFarm
The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses. -
Anthropic-HH
The Anthropic-HH dataset is a collection of human feedback for language model training.