-
Policy Optimization for Low-rank MDPs (POLO)
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback -
Solving Robust MDPs through No-Regret Dynamics
The Robust MDPs problem is a Markov Decision Process problem where the goal is to find a policy π that maximizes the Value Function under worst-case transition dynamics. -
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. -
Posterior Sampling for Reinforcement Learning
The dataset used in the paper is a random finite horizon Markov decision process (MDP) with states S, actions A, and horizon τ. -
BRIDGE dataset
The BRIDGE dataset is a collection of 155 deterministic MDPs, each with a horizon of 100 time steps. The dataset is used to evaluate the performance of reinforcement learning... -
Four Rooms
The Four Rooms environment is a stochastic version of the classic Atari game Four Rooms. The environment has 104 states and 4 actions, and the agent can move in any of the 4... -
Markov Decision Process
The dataset used in the paper is a Markov Decision Process, where states can take values in a state space X, corresponding to a state x ∈ X, we can take an action u ∈ U,...