Training a helpful and harmless assistant with reinforcement learning from human feedback

doi:doi:10.57702/ueb4xymx

Training a helpful and harmless assistant with reinforcement learning from human feedback

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan (2024). Dataset: Training a helpful and harmless assistant with reinforcement learning from human feedback. https://doi.org/10.57702/ueb4xymx

DOI retrieved: December 2, 2024

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2312.09244
Citation	https://doi.org/10.48550/arXiv.2403.16649 https://doi.org/10.48550/arXiv.2310.00819 https://doi.org/10.48550/arXiv.2307.01139 https://doi.org/10.48550/arXiv.2406.15568
Author	Yuntao Bai
More Authors	Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan
Homepage	https://arxiv.org/abs/2204.05862