Markov Decision Processes - Groups

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an inﬁnite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.
- Dataset
- JSON
Markov Decision Process

The dataset used in the paper is a Markov Decision Process, where states can take values in a state space X, corresponding to a state x ∈ X, we can take an action u ∈ U,...
- Dataset
- JSON

2 datasets found