Interpreting Learned Feedback Patterns in Large Language Models
Data and Resources
-
Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Cite this as
Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, David Krueger, Philip Torr, Fazl Barez (2024). Dataset: Interpreting Learned Feedback Patterns in Large Language Models. https://doi.org/10.57702/aan7igw0
DOI retrieved: December 2, 2024
Additional Info
Field | Value |
---|---|
Created | December 2, 2024 |
Last update | December 2, 2024 |
Author | Luke Marks |
More Authors |
|
Homepage | https://github.com/apartresearch/Interpreting-Reward-Models |