-
Deep Attention Recurrent Q-Network
The Deep Attention Recurrent Q-Network (DARQN) algorithm was tested on several popular Atari 2600 games: Breakout, Seaquest, Space Invaders, Tutankham, and Gopher. -
Direct preference optimization: Your language model is secretly a reward model
The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used a language model to optimize the performance of a reinforcement... -
Metadrive: Composing diverse driving scenarios for generalizable reinforcemen...
The dataset used in the paper is Metadrive, a driving simulator. -
LightZero: A unified benchmark for Monte Carlo Tree Search in general sequent...
The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used Atari environments and board games to evaluate the proposed algorithm. -
Distributional Reinforcement Learning with Quantile Regression
Distributional reinforcement learning with quantile regression -
Markov Decision Process
The dataset used in the paper is a Markov Decision Process, where states can take values in a state space X, corresponding to a state x ∈ X, we can take an action u ∈ U,... -
Meta-World and Robomimic
The dataset used in the paper is a robotic manipulation task dataset, which consists of trajectories and preference labels. -
DeepMind Control Suite
The DeepMind Control Suite is a collection of 20 robotic manipulation tasks, each with 5 different environments and 5 different robot parameters. The tasks are designed to test... -
BBRL Activations Dataset
The dataset used in the paper is a collection of activations from a feature extraction network and a reactive network, used to train a Variational Autoencoder (VAE) to learn... -
Deep Reinforcement Learning Based Controller for Active Heave Compensation
Heave compensation is an essential part in various offshore operations. It is used in various applications, which include on-loading or off-loading systems, offshore drilling,... -
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and... -
Google Research Football (GRF)
The dataset used in the paper is Google Research Football (GRF) environment. -
Cart-Pole, Pendulum, and Cart-Pole Balance Environments
The dataset used in this paper is a set of three classic control environments: cart-pole swing-up (CPSU), pendulum swing-up (PSU), and cart-pole balance (CPB). -
D4RL Benchmark
D4RL benchmark dataset, which consists of four offline logging datasets, collected by different one or mixed behavior policies. -
Roboschool
The dataset used in the ACE algorithm for continuous control problems. -
Blocksworld Dataset
The Blocksworld dataset is a photo-realistic environment, where the goal is to move blocks around to achieve a goal state. The dataset consists of 480/2592 possible... -
8-Puzzle Dataset
The dataset used in the paper is an 8-puzzle environment, where the goal is to solve the puzzle by moving tiles around. The dataset consists of 20000 transition inputs, which...