Dataset - LDM

Toxic-DPO Dataset

The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.
- Dataset
- JSON
Anthropic-HH-RLHF Dataset

The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found