HH-RLHF

doi:doi:10.57702/xvs2nhsa

You're currently viewing an old version of this dataset. To see the current version, click here.

HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu (2024). Dataset: HH-RLHF. https://doi.org/10.57702/xvs2nhsa

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2404.05530
Author	Shusheng Xu
More Authors	Wei Fu Jiaxuan Gao Wenjie Ye Weilin Liu Zhiyu Mei Guangju Wang Chao Yu Yi Wu
Homepage	https://huggingface.co/OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1