-
Replay Buffer
The dataset used in the paper is a replay buffer containing observations from a navigation task. -
MuJoCo Environment
The dataset used in the paper is a MuJoCo environment, with 13-states and 4-control inputs, nonlinear dynamics with polynomial dependency in the control inputs. -
Corridor Environment
The corridor environment is a simple environment where the agent has to determine whether the rewarding cell (colored yellow) is at the top or bottom, based on the color of the... -
Guard: A safe reinforcement learning benchmark
The dataset used in the paper is a collection of robot locomotion tasks with various constraints. -
State-wise Constrained Policy Optimization
State-wise Constrained Policy Optimization (SCPO) is a general-purpose policy search algorithm for state-wise constrained reinforcement learning. -
Defense Against Reward Poisoning Attacks in Reinforcement Learning
We study defense strategies against reward poisoning attacks in reinforcement learning. -
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Obser...
The proposed network contains clustering layers, based on earlier work by Afshar et al., 2020 and Bethi et al., 2022, with an introduction of TD-error modulation and eligibility... -
On the Theory of Reinforcement Learning
The dataset is used to study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. -
HandManipulateBlock
The HandManipulateBlock environment from OpenAI gym robotics suite -
FetchPickAndPlace and HandManipulateBlock
The FetchPickAndPlace and HandManipulateBlock environments from OpenAI gym robotics suite -
FetchPush, FetchPickAndPlace and HandManipulateBlock
The FetchPush, FetchPickAndPlace and HandManipulateBlock environments from OpenAI gym robotics suite -
Dense Reward for Free in RLHF
The dataset used in the paper is not explicitly described, but it is mentioned that it is a preference dataset for language models. -
Funnel board
The Funnel board task is a domain where a ball falls through a grid of obstacles onto one of five platforms. Every other row of obstacles consists of funnel-shaped objects,... -
Room runner
The Room runner task is a domain where an agent moves through a randomly generated map of rooms, which are observed in 2D from above. The agent follows the policy of always... -
Event Camera-based Reinforcement Learning
The dataset used in the paper is a simulated environment for event camera-based reinforcement learning. The dataset includes a car-like robot equipped with an event camera, and... -
Alchemy: A structured task distribution for meta-reinforcement learning
The Alchemy benchmark is a meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make ne-grained analysis tractable. -
Sym-Q: Adaptive Symbolic Regression via Sequential Decision-Making
Symbolic regression holds great potential for uncovering underlying mathematical and physical relationships from empirical data. The authors introduce Symbolic Q-network... -
Multiple-confounded-Mujoco-Envs
The dataset used in the paper is a collection of environments with multiple confounders, including mass, length, damping, and a crippled leg. The dataset is used to evaluate the...