A general theoretical paradigm to understand learning from human preferences

doi:doi:10.57702/lafwgps7

A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello (2024). Dataset: A general theoretical paradigm to understand learning from human preferences. https://doi.org/10.57702/lafwgps7

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2405.20304
Citation	https://doi.org/10.48550/arXiv.2407.01648
Author	Mohammad Gheshlaghi Azar
More Authors	Zhaohan Daniel Guo Bilal Piot Remi Munos Mark Rowland Michal Valko Daniele Calandriello
Homepage	https://arxiv.org/abs/2310.12036