-
State-wise Constrained Policy Optimization
State-wise Constrained Policy Optimization (SCPO) is a general-purpose policy search algorithm for state-wise constrained reinforcement learning. -
Pretrained Visual Representations in Reinforcement Learning
Visual reinforcement learning (RL) has made significant progress in recent years, but the choice of visual feature extractor remains a crucial design decision. -
DRiLLS: Deep Reinforcement Learning for Logic Synthesis
Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. The authors... -
Interactive Scoring IRL
The dataset used in the paper is a set of trajectories and scores provided by human teachers to train a behavioral policy in a sparse reward environment. -
MuJoCo Continuous Control Tasks
The dataset used in the paper is a collection of data from the MuJoCo continuous control tasks. -
Defense Against Reward Poisoning Attacks in Reinforcement Learning
We study defense strategies against reward poisoning attacks in reinforcement learning. -
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Obser...
The proposed network contains clustering layers, based on earlier work by Afshar et al., 2020 and Bethi et al., 2022, with an introduction of TD-error modulation and eligibility... -
On the Theory of Reinforcement Learning
The dataset is used to study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. -
HandManipulateBlock
The HandManipulateBlock environment from OpenAI gym robotics suite -
FetchPickAndPlace and HandManipulateBlock
The FetchPickAndPlace and HandManipulateBlock environments from OpenAI gym robotics suite -
FetchPush, FetchPickAndPlace and HandManipulateBlock
The FetchPush, FetchPickAndPlace and HandManipulateBlock environments from OpenAI gym robotics suite -
Dense Reward for Free in RLHF
The dataset used in the paper is not explicitly described, but it is mentioned that it is a preference dataset for language models. -
SAI Dataset
The dataset used for training the SAI agent, containing 7x7 Go games with multiple komi values. -
MuJoCo environments
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used MuJoCo environments from the OpenAI gym. -
OpenAI Gym benchmark
The dataset used in the paper is the OpenAI Gym benchmark, which provides a set of environments for reinforcement learning. -
Funnel board
The Funnel board task is a domain where a ball falls through a grid of obstacles onto one of five platforms. Every other row of obstacles consists of funnel-shaped objects,... -
Room runner
The Room runner task is a domain where an agent moves through a randomly generated map of rooms, which are observed in 2D from above. The agent follows the policy of always... -
Discovering Blind Spots in Reinforcement Learning
The dataset used in the paper is a collection of oracle feedback, which is used to learn a blind spot model of the target world. -
Event Camera-based Reinforcement Learning
The dataset used in the paper is a simulated environment for event camera-based reinforcement learning. The dataset includes a car-like robot equipped with an event camera, and...