Dataset - LDM

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback
- Dataset
- JSON
Differences in Fairness Preferences

A crowdsourced dataset for studying differences in fairness preferences depending on demographic identities.
- Dataset
- JSON
CodeContest

The dataset used in the paper for training and testing the DPO and PPO models.
- Dataset
- JSON
APPS

The dataset used in the paper for training and testing the DPO and PPO models.
- Dataset
- JSON
HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.
- Dataset
- JSON
SafeRLHF

The dataset used in the paper for training and testing the DPO and PPO models.
- Dataset
- JSON
REBEL

REBEL is a dataset for reward regularization based robotic reinforcement learning from human feedback.
- Dataset
- JSON
Okapi

The dataset is used for instruction-tuning of LLMs in multiple languages using reinforcement learning from human feedback.
- Dataset
- JSON
Training a helpful and harmless assistant with reinforcement learning from hu...

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.
- Dataset
- JSON
SHP dataset

The SHP dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).
- Dataset
- JSON
HH-RLHF dataset

The HH-RLHF dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).
- Dataset
- JSON
Toxic-DPO Dataset

The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.
- Dataset
- JSON
Anthropic-HH-RLHF Dataset

The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.
- Dataset
- JSON
UltraRM-13B

The UltraRM-13B dataset is a collection of human feedback for language model training.
- Dataset
- JSON
AlpacaFarm

The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.
- Dataset
- JSON
Anthropic-HH

The Anthropic-HH dataset is a collection of human feedback for language model training.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

16 datasets found