-
Atari 2600 Game
The dataset used in the paper is an Atari 2600 game, where the agent receives reward 1 when a point is scored and 0 otherwise. -
Gridworld Environment
The dataset used in the paper is a gridworld environment, where an agent attempts to navigate to a goal block. Observations are 11x11 greyscale images, and the agent receives... -
D4RL Benchmark
D4RL benchmark dataset, which consists of four offline logging datasets, collected by different one or mixed behavior policies.