-
Bipedal Walker, Acrobot, and Continuous Lunar Lander tasks
The dataset used in this paper is a reinforcement learning benchmark problem, specifically the Bipedal Walker, Acrobot, and Continuous Lunar Lander tasks. -
BRIDGE dataset
The BRIDGE dataset is a collection of 155 deterministic MDPs, each with a horizon of 100 time steps. The dataset is used to evaluate the performance of reinforcement learning... -
DeepMind Control Suite and PyBullet Environments
The dataset used in this paper is the DeepMind Control Suite and PyBullet Environments. -
The Arcade Learning Environment: An Evaluation Platform for General Agents
The Arcade Learning Environment (ALE) is a lasting and indispensable element of the RL researcher’s toolbox. It is also the focus of our work. Since its inception, hundreds of... -
Visual Grid World Environment and TextWorld domain
The dataset used in the paper is a Visual Grid World Environment and the TextWorld domain. -
Relay Policy Learning
Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. -
Reinforcement Learning for (Mixed) Integer Programming: Smart Feasibility Pump
Mixed integer programming (MIP) problems with a linear objective, linear constraints, and integral constraints. -
Fine-tuning Language Models with Advantage-Induced Policy Alignment
The dataset used in the paper is the Anthropic Helpfulness and Harmlessness dataset and the StackExchange dataset. -
MuJoCo Environments with Noise Augmentation
The dataset used in the paper is a set of MuJoCo environments with noise augmentation. -
Car Racing game dataset
The dataset used in this paper is the Car Racing game dataset, which consists of pixel frames of a car racing game. -
OpenAI Gym Environment dataset
The dataset used in this paper is the OpenAI Gym Environment dataset, which consists of various games and environments. -
Atari 2600 games dataset
The dataset used in this paper is the Atari 2600 games dataset, which consists of 50 Atari 2600 games. -
CodeContest
The dataset used in the paper for training and testing the DPO and PPO models. -
Cambridge restaurant domain
The dataset used in the paper is the Cambridge restaurant domain from the PyDial toolkit.