-
Bipedal Walker, Acrobot, and Continuous Lunar Lander tasks
The dataset used in this paper is a reinforcement learning benchmark problem, specifically the Bipedal Walker, Acrobot, and Continuous Lunar Lander tasks. -
Constraint Sampling Reinforcement Learning
The dataset used in the paper is a set of environments for reinforcement learning, including movie recommendations, educational activities sequencing, and HIV treatment. -
BRIDGE dataset
The BRIDGE dataset is a collection of 155 deterministic MDPs, each with a horizon of 100 time steps. The dataset is used to evaluate the performance of reinforcement learning... -
DeepMind Control Suite and PyBullet Environments
The dataset used in this paper is the DeepMind Control Suite and PyBullet Environments. -
The Arcade Learning Environment: An Evaluation Platform for General Agents
The Arcade Learning Environment (ALE) is a lasting and indispensable element of the RL researcher’s toolbox. It is also the focus of our work. Since its inception, hundreds of... -
Visual Grid World Environment and TextWorld domain
The dataset used in the paper is a Visual Grid World Environment and the TextWorld domain. -
Archive Distillation
The archive A contains policies parameterized by deep neural networks and trained via a state of the art QD-RL method PPGA. -
Generating Behaviorally Diverse Policies with Latent Diffusion Models
Quality Diversity (QD) is an emerging field in which collections of high performing, behaviorally diverse solutions are trained. The foundational method, Map Elites, maintains... -
OpenAI Gym and Atari games
The dataset used in the paper is not explicitly described, but it is mentioned that the authors conducted experiments on several representative tasks from the OpenAI Gym and... -
Relay Policy Learning
Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. -
Reinforcement Learning for (Mixed) Integer Programming: Smart Feasibility Pump
Mixed integer programming (MIP) problems with a linear objective, linear constraints, and integral constraints. -
Fine-tuning Language Models with Advantage-Induced Policy Alignment
The dataset used in the paper is the Anthropic Helpfulness and Harmlessness dataset and the StackExchange dataset. -
Automated Driving Dataset
The dataset used in the paper for automated driving, including scenarios with occluded intersections and merging. -
Towards True Lossless Sparse Communication in Multi-Agent Systems
The dataset used in the paper is a multi-agent reinforcement learning environment, where agents need to communicate with each other to achieve their goals. -
DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic
DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic -
MuJoCo Environments with Noise Augmentation
The dataset used in the paper is a set of MuJoCo environments with noise augmentation. -
Car Racing game dataset
The dataset used in this paper is the Car Racing game dataset, which consists of pixel frames of a car racing game.