-
Reinforcement Re-ranking with 2D Grid-based Recommendation Panels
A novel Markov decision process (MDP)-based re-ranking model for final-stage recommendation, called Panel-MDP. -
Binary Tree MDP
The dataset used in the paper is a binary tree MDP, where the agent must execute a sequence of L uninterrupted UP movements. The dataset is used to test the Successor... -
Procgen Dataset
The dataset used in the experiments, which contains procedurally generated environments. -
Rainbow dataset
The dataset used in the paper is the Rainbow dataset, which is a combination of six extensions to the DQN algorithm. -
City Brain Challenge dataset
The dataset used in the City Brain Challenge competition, containing a real-world city-scale road network and its traffic demand derived from real traffic data. -
Cart-Pole Problem
The cart-pole problem is a classic control problem in robotics and control theory. It is a continuous control problem where the goal is to keep the pole upright by applying a... -
Generalization in Deep Reinforcement Learning for Robotic Navigation by Rewar...
A novel reward function for reinforcement learning and a Soft Actor-Critic algorithm to train a DRL policy in the context of local navigation for autonomous mobile robots in... -
Dynamic Frame Skip Deep Q-Network (DFDQN) dataset
The dataset used in the paper is the Dynamic Frame Skip Deep Q-Network (DFDQN) dataset, which consists of 3 Atari games: Seaquest, Space Invaders, and Alien. -
Deep Q-Network (DQN) dataset
The dataset used in the paper is the Deep Q-Network (DQN) dataset, which consists of 15 classic Atari games. -
SCIMAI-Gym
The SCIM environment proposed in this paper is a stochastic and divergent two-echelon supply chain that includes a factory that can produce various product types, a factory... -
Dual Pendulum Environment
The dataset used in the paper is a dual-pendulum environment, which is a continuous action space task. -
Atari 2600 Game
The dataset used in the paper is an Atari 2600 game, where the agent receives reward 1 when a point is scored and 0 otherwise. -
Gridworld Environment
The dataset used in the paper is a gridworld environment, where an agent attempts to navigate to a goal block. Observations are 11x11 greyscale images, and the agent receives... -
Seeker environment
A simulated environment for training RL models named Seeker is introduced. The Seeker environment enables the training of models based on visual input to make realistic movement... -
DMC Environments
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a state-based DMC environments (Tunyasuvunakool et al., 2020) for their... -
Proximal Policy Optimization (PPO) dataset
The dataset used in the paper is the Proximal Policy Optimization (PPO) dataset, which consists of 10 different Atari environments. -
Mountain Car and 4-dimensional Catcher
The dataset used in this paper is a reinforcement learning dataset, specifically the Mountain Car and 4-dimensional Catcher environments. -
Monas: Multi-objective neural architecture search using reinforcement learning
The authors propose a multi-objective neural architecture search using reinforcement learning. -
Low-Precision Reinforcement Learning
The dataset used in this paper for low-precision reinforcement learning.