Markov Decision Processes - Groups

Policy Optimization for Low-rank MDPs (POLO)

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Dataset
JSON

Solving Robust MDPs through No-Regret Dynamics

The Robust MDPs problem is a Markov Decision Process problem where the goal is to find a policy π that maximizes the Value Function under worst-case transition dynamics.

Dataset
JSON

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an inﬁnite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.

Dataset
JSON

Posterior Sampling for Reinforcement Learning

The dataset used in the paper is a random finite horizon Markov decision process (MDP) with states S, actions A, and horizon τ.

Dataset
JSON

BRIDGE dataset

The BRIDGE dataset is a collection of 155 deterministic MDPs, each with a horizon of 100 time steps. The dataset is used to evaluate the performance of reinforcement learning...

Dataset
JSON

Four Rooms

The Four Rooms environment is a stochastic version of the classic Atari game Four Rooms. The environment has 104 states and 4 actions, and the agent can move in any of the 4...

Dataset
JSON