-
Grid World
The dataset used in the paper is a reinforcement learning dataset, specifically a Markov Decision Process (MDP) with a finite set of states and actions. -
Grid World Navigation Task
The dataset used in the paper is a grid world navigation task with four actions: up, down, left, or right. Transitions are stochastic and with a 5% probability the agent moves... -
Toy Example Dataset
The dataset used in the paper is a toy example, consisting of a 10x10 grid world, with the agent at position (0, 0). Obstacles are randomly positioned, at an obstacle to free...