Gridworld Environment

The dataset used in the paper is a gridworld environment, where an agent attempts to navigate to a goal block. Observations are 11x11 greyscale images, and the agent receives reward 1 when it reaches and covers the goal block, and 0 reward otherwise.

Data and Resources

Cite this as

Eric J. Michaud, Adam Gleave, Stuart Russell (2024). Dataset: Gridworld Environment. https://doi.org/10.57702/cgw6dlil

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2012.05862
Author Eric J. Michaud
More Authors
Adam Gleave
Stuart Russell
Homepage https://github.com/HumanCompatibleAI/interpreting-rewards