-
Temporally Layered Architecture (TLA) for Adaptive, Distributed and Continuou...
The dataset used in the Temporally Layered Architecture (TLA) for adaptive, distributed and continuous control. -
Self-Learning Search Engine (SLSE) dataset
The dataset used in this paper is a multimedia search engine dataset, which is a Self-Learning Search Engine (SLSE) architecture based on reinforcement learning. -
Waypoints and Edges
The dataset used in the paper is a set of waypoints and edges for planning. -
2D Environment
The dataset used in the paper is a 2D environment where experiments are done. -
Policy Gradients using Variational Quantum Circuits
Variational Quantum Circuits are being used as versatile Quantum Machine Learning models. Some empirical results exhibit an advantage in supervised and generative learning... -
Replay Buffer
The dataset used in the paper is a replay buffer containing observations from a navigation task. -
Bank Heist
The Bank Heist environment is a 2D maze with four rooms, where the objective is to navigate to banks distributed across the four mazes. -
Noisy MNIST
The MNIST environment does not elicit any actions from an agent. Instead, the prediction network simply needs to learn one step mappings between pairs of MNIST handwritten digits. -
MuJoCo Environment
The dataset used in the paper is a MuJoCo environment, with 13-states and 4-control inputs, nonlinear dynamics with polynomial dependency in the control inputs. -
Corridor Environment
The corridor environment is a simple environment where the agent has to determine whether the rewarding cell (colored yellow) is at the top or bottom, based on the color of the... -
Policy Optimization for Low-rank MDPs (POLO)
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback -
Solving Robust MDPs through No-Regret Dynamics
The Robust MDPs problem is a Markov Decision Process problem where the goal is to find a policy π that maximizes the Value Function under worst-case transition dynamics. -
Pong Variants
The dataset used in the paper is a set of Pong variants, including Noisy, Black, White, Zoom, and others. -
3D Maze Games
The dataset used in the paper is a set of 3D maze games, including Labyrinth and others. -
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death -
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback -
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. -
Double Tunnel
The dataset used in the paper is for the Double Tunnel environment, which is a safety-critical task. -
Custom Pong Environment
A new Pong environment with a much higher degree of configurability than the current standard, including the ability to compete against a human opponent. -
Guard: A safe reinforcement learning benchmark
The dataset used in the paper is a collection of robot locomotion tasks with various constraints.