You're currently viewing an old version of this dataset. To see the current version, click here.

Posterior Sampling for Reinforcement Learning

The dataset used in the paper is a random finite horizon Markov decision process (MDP) with states S, actions A, and horizon τ.

Data and Resources

Cite this as

Ian Osband, Benjamin Van Roy, Daniel Russo (2024). Dataset: Posterior Sampling for Reinforcement Learning. https://doi.org/10.57702/shyazt9q

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1306.0940
Author Ian Osband
More Authors
Benjamin Van Roy
Daniel Russo
Homepage https://arxiv.org/abs/1301.2609