Grid-world environment
The dataset used in the paper is a grid-world environment, which is a discrete MDP. The environment has four walls, some obstacles, a start-state and a reward-state. The goal of the agent is to navigate the grid and reach the reward state.
BibTex: