-
Solving Robust MDPs through No-Regret Dynamics
The Robust MDPs problem is a Markov Decision Process problem where the goal is to find a policy π that maximizes the Value Function under worst-case transition dynamics. -
Pong Variants
The dataset used in the paper is a set of Pong variants, including Noisy, Black, White, Zoom, and others. -
3D Maze Games
The dataset used in the paper is a set of 3D maze games, including Labyrinth and others. -
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death -
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback -
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. -
Double Tunnel
The dataset used in the paper is for the Double Tunnel environment, which is a safety-critical task. -
Custom Pong Environment
A new Pong environment with a much higher degree of configurability than the current standard, including the ability to compete against a human opponent. -
Guard: A safe reinforcement learning benchmark
The dataset used in the paper is a collection of robot locomotion tasks with various constraints. -
State-wise Constrained Policy Optimization
State-wise Constrained Policy Optimization (SCPO) is a general-purpose policy search algorithm for state-wise constrained reinforcement learning. -
Pretrained Visual Representations in Reinforcement Learning
Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. -
DRiLLS: Deep Reinforcement Learning for Logic Synthesis
Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. The authors... -
Interactive Scoring IRL
The dataset used in the paper is a set of trajectories and scores provided by human teachers to train a behavioral policy in a sparse reward environment. -
MuJoCo Continuous Control Tasks
The dataset used in the paper is a collection of data from the MuJoCo continuous control tasks. -
Defense Against Reward Poisoning Attacks in Reinforcement Learning
We study defense strategies against reward poisoning attacks in reinforcement learning. -
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Obser...
The proposed network contains clustering layers, based on earlier work by Afshar et al., 2020 and Bethi et al., 2022, with an introduction of TD-error modulation and eligibility... -
On the Theory of Reinforcement Learning
The dataset is used to study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. -
HandManipulateBlock
The HandManipulateBlock environment from OpenAI gym robotics suite -
FetchPickAndPlace and HandManipulateBlock
The FetchPickAndPlace and HandManipulateBlock environments from OpenAI gym robotics suite