-
Relay Policy Learning
Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. -
Reinforcement Learning for (Mixed) Integer Programming: Smart Feasibility Pump
Mixed integer programming (MIP) problems with a linear objective, linear constraints, and integral constraints. -
Fine-tuning Language Models with Advantage-Induced Policy Alignment
The dataset used in the paper is the Anthropic Helpfulness and Harmlessness dataset and the StackExchange dataset. -
Automated Driving Dataset
The dataset used in the paper for automated driving, including scenarios with occluded intersections and merging. -
Towards True Lossless Sparse Communication in Multi-Agent Systems
The dataset used in the paper is a multi-agent reinforcement learning environment, where agents need to communicate with each other to achieve their goals. -
DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic
DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic -
MuJoCo Environments with Noise Augmentation
The dataset used in the paper is a set of MuJoCo environments with noise augmentation. -
Car Racing game dataset
The dataset used in this paper is the Car Racing game dataset, which consists of pixel frames of a car racing game. -
OpenAI Gym Environment dataset
The dataset used in this paper is the OpenAI Gym Environment dataset, which consists of various games and environments. -
Atari 2600 games dataset
The dataset used in this paper is the Atari 2600 games dataset, which consists of 50 Atari 2600 games. -
UNAS: Differentiable Architecture Search Meets Reinforcement Learning
UNAS: Differentiable Architecture Search Meets Reinforcement Learning -
Continual World
The Continual World benchmark consists of ten realistic robotic manipulation tasks. -
ML4H Findings Track Collection: Machine Learning for Health (ML4H) 2023
A synthetic dataset for training a family of Reinforcement Learning (RL) methods to build explainable pathways for the differential diagnosis of anemia, as a primary use case.