You're currently viewing an old version of this dataset. To see the current version, click here.

A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.

Data and Resources

This dataset has no data

Cite this as

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello (2024). Dataset: A general theoretical paradigm to understand learning from human preferences. https://doi.org/10.57702/lafwgps7

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2405.20304
Citation
  • https://doi.org/10.48550/arXiv.2407.01648
Author Mohammad Gheshlaghi Azar
More Authors
Zhaohan Daniel Guo
Bilal Piot
Remi Munos
Mark Rowland
Michal Valko
Daniele Calandriello
Homepage https://arxiv.org/abs/2310.12036