Human Feedback - Groups

Anthropic Helpfulness Base eval

The dataset used in the paper is the Anthropic Helpfulness Base eval dataset.

Dataset
JSON

Anthropic Helpfulness Base

The dataset used in the paper is the Anthropic Helpfulness Base train dataset and the Anthropic Helpfulness eval dataset.

Dataset
JSON

HIVE: Harnessing Human Feedback for Instructional Visual Editing

The dataset used in the paper Harnessing Human Feedback for Instructional Visual Editing (HIVE) for instructional visual editing.

Dataset
JSON

Anthropic HH dataset

The Anthropic HH dataset is a general-purpose preference dataset for helpfulness and harmlessness.

Dataset
JSON

Training a helpful and harmless assistant with reinforcement learning from hu...

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.

Dataset
JSON

AlpacaFarm

The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.

Dataset
JSON