You're currently viewing an old version of this dataset. To see the current version, click here.

A general theoretical paradigm to understand learning from human preferences

The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF.

Data and Resources

This dataset has no data

Cite this as

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello (2024). Dataset: A general theoretical paradigm to understand learning from human preferences. https://doi.org/10.57702/lafwgps7

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2405.20304
Citation	https://doi.org/10.48550/arXiv.2407.01648
Author	Mohammad Gheshlaghi Azar
More Authors	Zhaohan Daniel Guo Bilal Piot Remi Munos Mark Rowland Michal Valko Daniele Calandriello
Homepage	https://arxiv.org/abs/2310.12036