-
Anthropic Helpfulness Base eval
The dataset used in the paper is the Anthropic Helpfulness Base eval dataset. -
Anthropic Helpfulness Base
The dataset used in the paper is the Anthropic Helpfulness Base train dataset and the Anthropic Helpfulness eval dataset. -
Dense Reward for Free in RLHF
The dataset used in the paper is not explicitly described, but it is mentioned that it is a preference dataset for language models. -
HIVE: Harnessing Human Feedback for Instructional Visual Editing
The dataset used in the paper Harnessing Human Feedback for Instructional Visual Editing (HIVE) for instructional visual editing. -
Anthropic HH dataset
The Anthropic HH dataset is a general-purpose preference dataset for helpfulness and harmlessness. -
Training a helpful and harmless assistant with reinforcement learning from hu...
The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation. -
Anthropic's HH-RLHF and OpenAI's summarization datasets
The dataset used in the paper is the Anthropic's HH-RLHF and OpenAI's summarization datasets. -
AlpacaFarm
The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.