A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.

Data and Resources

Cite this as

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello (2024). Dataset: A general theoretical paradigm to understand learning from human preferences. https://doi.org/10.57702/lafwgps7

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2405.20304
Citation
  • https://doi.org/10.48550/arXiv.2407.01648
Author Mohammad Gheshlaghi Azar
More Authors
Zhaohan Daniel Guo
Bilal Piot
Remi Munos
Mark Rowland
Michal Valko
Daniele Calandriello
Homepage https://arxiv.org/abs/2310.12036