Reward hacking
Data and Resources
-
Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Cite this as
Alexander Pan, Kush Bhatia, Jacob Steinhardt (2025). Dataset: Reward hacking. https://doi.org/10.57702/ygbr0a0x
DOI retrieved: January 3, 2025
Additional Info
Field | Value |
---|---|
Created | January 3, 2025 |
Last update | January 3, 2025 |
Defined In | https://doi.org/10.48550/arXiv.2201.03544 |
Author | Alexander Pan |
More Authors |
|
Homepage | https://github.com/aypan17/reward-misspecification |