-
Pong, Breakout, Space Invaders, and Seaquest games
The dataset used in the paper is the Pong, Breakout, Space Invaders, and Seaquest games. -
Atari 2600 Arcade Learning Environment
The dataset used in the paper is the Atari 2600 Arcade Learning Environment. -
Minigrid environment
The dataset used in the paper is the Minigrid environment, which is a 3D grid world with a goal at the bottom-right corner. The agent learns to navigate to the goal using human... -
LunarLander environment
The dataset used in the paper is the LunarLander environment, which is a classic control problem. The agent learns to land a lunar lander using human feedback. -
CartPole environment
The dataset used in the paper is the CartPole environment, which is a classic control problem. The agent learns to balance a pole using human feedback. -
Efficient Reinforcement Learning in Deterministic Systems
The dataset is used to test the Optimistic Constraint Propagation algorithm for reinforcement learning in deterministic systems. -
Treasure World
The Treasure World domain is a 3D navigation domain within the DM Lab framework. The domain consists of one large room filled with 64 objects of multiple types. Whenever an... -
Playing Catan with Cross-dimensional Neural Network
Catan is a strategic board game with many interesting properties, including multi-player, imperfect information, stochasticity, a complex state space structure (hexagonal board... -
Crippled-Ant Environment
The Crippled-Ant Environment is a high-dimensional continuous control environment, where a quadruped aims to attain the highest possible velocity in a limited amount of time. -
PolicyCleanse: Backdoor Detection and Mitigation for Reinforcement Learning
PolicyCleanse: Backdoor Detection and Mitigation for Reinforcement Learning -
RESACT: REINFORCING LONG-TERM ENGAGEMENT
Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and... -
Arcade Learning Environment (ALE) and Gym MuJoCo benchmark
The dataset used in the paper is the Arcade Learning Environment (ALE) and the Gym MuJoCo benchmark. -
Atari RAM Games
The dataset is used to demonstrate the effectiveness of the Discovery of Deep Options (DDO) algorithm in accelerating reinforcement learning. -
A Relearning Approach to Reinforcement Learning for control of Smart Buildings
This paper demonstrates that continual relearning of control policies using incremental deep reinforcement learning can improve policy learning for non-stationary processes. -
Mol-AIR: A Molecular Optimization Framework with Adaptive Intrinsic Rewards
The Mol-AIR dataset is a molecular optimization framework with adaptive intrinsic rewards that performs efficient exploration for effective goal-directed molecular generation. -
Introspection Learning Dataset
The dataset used in the Introspection Learning algorithm, which consists of a family of subsets of state-action pairs (Ui)i, used to query the oracle ωπ. -
Reinforcement Learning from Human Feedback with Active Queries
Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human... -
PASA: Probabilistic Adaptive State Aggregation
The dataset used in the paper is a state aggregation approximation architecture, which is adapted using feedback regarding the frequency with which an agent has visited certain...