-
Google Research Football (GRF)
The dataset used in the paper is Google Research Football (GRF) environment. -
D4RL Benchmark
D4RL benchmark dataset, which consists of four offline logging datasets, collected by different one or mixed behavior policies. -
Roboschool
The dataset used in the ACE algorithm for continuous control problems. -
AndroidEnv
The dataset used in this paper is the AndroidEnv environment. -
Playhouse and AndroidEnv
The dataset used in this paper is the Playhouse and AndroidEnv environments. -
Human-level control through deep reinforcement learning
The dataset contains data from human-level control through deep reinforcement learning. -
Training a helpful and harmless assistant with reinforcement learning from hu...
The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation. -
Boosting reinforcement learning in competitive influence maximization with tra...
The dataset used in the paper is a real-world dataset for influence maximization, which is a combinatorial optimization problem. -
Cartpole, Canniballs, and StarCraft II Learning Environment
The dataset used in the paper is a reinforcement learning environment, specifically Cartpole, Canniballs, and a custom minigame in the StarCraft II Learning Environment. -
Atari Learning Environment
The dataset used in this paper is the Atari Learning Environment (ALE) dataset, which consists of 15 Atari video games. -
OpenAI Gym
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used several continuous control environments from the OpenAI Gym. -
Gridworld domain
The dataset used in the paper is a simple gridworld domain with pixel-based states. -
Toxic-DPO Dataset
The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback. -
Anthropic-HH-RLHF Dataset
The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.