-
Mountain Climbing Tasks
The dataset used in the paper is a set of Mountain Climbing tasks, which are a collection of tasks that involve climbing a mountain using a robotic arm. -
Object-pusher environment
The dataset used in the paper is a simulated object-pusher environment. -
Target Stacking
A synthetic block stacking environment with physics simulation in which the agent can learn block stacking end-to-end through trial and error, bypassing to explicitly model the... -
Reinforcement learning to optimize long-term user engagement in recommender s...
A method for optimizing long-term user engagement in recommender systems. -
Rethinking reinforcement learning for recommendation: A prompt perspective
A prompt-based approach for sequential recommendation. -
DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence
DARLEI is a framework that combines evolutionary algorithms with parallelized reinforcement learning for efficiently training and evolving populations of UNIMAL agents. -
Grid World
The dataset used in the paper is a reinforcement learning dataset, specifically a Markov Decision Process (MDP) with a finite set of states and actions. -
Human-Human Trajectories
The dataset used in the paper is a set of human-human trajectories for training a behavioral cloning model. -
Temporally Layered Architecture (TLA) for Adaptive, Distributed and Continuou...
The dataset used in the Temporally Layered Architecture (TLA) for adaptive, distributed and continuous control. -
Self-Learning Search Engine (SLSE) dataset
The dataset used in this paper is a multimedia search engine dataset, which is a Self-Learning Search Engine (SLSE) architecture based on reinforcement learning. -
Waypoints and Edges
The dataset used in the paper is a set of waypoints and edges for planning. -
2D Environment
The dataset used in the paper is a 2D environment where experiments are done. -
Policy Gradients using Variational Quantum Circuits
Variational Quantum Circuits are being used as versatile Quantum Machine Learning models. Some empirical results exhibit an advantage in supervised and generative learning... -
Replay Buffer
The dataset used in the paper is a replay buffer containing observations from a navigation task. -
Bank Heist
The Bank Heist environment is a 2D maze with four rooms, where the objective is to navigate to banks distributed across the four mazes. -
Noisy MNIST
The MNIST environment does not elicit any actions from an agent. Instead, the prediction network simply needs to learn one step mappings between pairs of MNIST handwritten digits. -
MuJoCo Environment
The dataset used in the paper is a MuJoCo environment, with 13-states and 4-control inputs, nonlinear dynamics with polynomial dependency in the control inputs. -
Corridor Environment
The corridor environment is a simple environment where the agent has to determine whether the rewarding cell (colored yellow) is at the top or bottom, based on the color of the... -
Policy Optimization for Low-rank MDPs (POLO)
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback