Reward hacking

The dataset consists of four RL environments with misspecified rewards, including traffic control, COVID response, blood glucose monitoring, and the Atari game Riverraid.

Data and Resources

Cite this as

Alexander Pan, Kush Bhatia, Jacob Steinhardt (2025). Dataset: Reward hacking. https://doi.org/10.57702/ygbr0a0x

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2201.03544
Author Alexander Pan
More Authors
Kush Bhatia
Jacob Steinhardt
Homepage https://github.com/aypan17/reward-misspecification