-
Posterior Sampling for Reinforcement Learning
The dataset used in the paper is a random finite horizon Markov decision process (MDP) with states S, actions A, and horizon τ. -
Mirror-Reversal and Rotation Tasks
The dataset used in the paper is a set of mirror-reversal and rotation tasks, used to test the performance of different reinforcement learning algorithms. -
Simulated Arm Reaching Task
The dataset used in the paper is a simulated biomechanical model of the human arm, used to test the performance of different reinforcement learning algorithms. -
Quantifying Multimodality in World Models
Multimodality in World Models -
OpenAI Gym’s Mujoco benchmark
The dataset used in this paper is a set of demonstrations for reinforcement learning, containing safe and unsafe trajectories. -
DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL S...
The dataset used in this paper is a set of demonstrations for reinforcement learning, containing safe and unsafe trajectories. -
Reward-Sharing Relational Networks in Multi-Agent Reinforcement Learning
Reward-Sharing Relational Networks in Multi-Agent Reinforcement Learning as a Framework for Emergent Behavior -
Cart-pole problem dataset
The dataset used for the cart-pole problem is a finite set of states: S, a finite set of actions: A, a state transition probability matrix, P, a reward function R, and a... -
DSSE: a Drone Swarm Search Environment
A Drone Swarm Search environment, based on PETTINGZOO, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. -
Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Lear...
Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a... -
Self-supervised Relational RL with Independently Controllable Subgoals
The dataset used in the paper is a multi-object environment with a robotic arm and multiple objects to manipulate. The agent learns to control the objects independently and... -
Inverted Pendulum
The dataset used in the paper is an Inverted Pendulum dataset, which is a standard benchmark system in control and reinforcement learning. -
Physics-informed reinforcement learning via probabilistic co-adjustment funct...
The dataset used in the paper is a physics-informed reinforcement learning dataset, where the goal is to learn to control a biomechanical human arm using only a two-link arm... -
ProcGen Maze and Leaper
The ProcGen Maze and Leaper environments are procedurally generated game environments used to benchmark reinforcement learning. -
Policy Optimization for Stochastic Shortest Path
Policy optimization for stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better... -
MineRL-v0 dataset
The MineRL-v0 dataset contains human demonstration data for tasks in Minecraft. -
MineRL BASALT competition
The MineRL BASALT competition dataset contains human demonstration data for four tasks in Minecraft. -
CartPole dataset
The dataset used in the paper is a high-dimensional state space of the CartPole environment, artificially expanded to 500 dimensions using a union of five different nonlinear...