Learning to summarize with human feedback

The paper presents a study on the impact of synthetic data on large language models (LLMs) and proposes a method to steer LLMs towards desirable non-differentiable attributes.

Data and Resources

Cite this as

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei (2024). Dataset: Learning to summarize with human feedback. https://doi.org/10.57702/bakxgny5

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2407.01490
Citation
  • https://doi.org/10.48550/arXiv.2312.09244
Author Nisan Stiennon
More Authors
Long Ouyang
Jeffrey Wu
Daniel Ziegler
Ryan Lowe
Chelsea Voss
Alec Radford
Dario Amodei