Stochastic MDP

The dataset used in this paper is a stochastic MDP with |S| = 4 and |A| = 4. One of the states is set to the terminal state, and one of the rest is set to the starting state. The transition probability and reward functions are randomly generated.

Data and Resources

Cite this as

Han-Dong Lim, HyeAnn Lee, Donghwan Lee (2024). Dataset: Stochastic MDP. https://doi.org/10.57702/pdj6emd3

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2402.11877
Author Han-Dong Lim
More Authors
HyeAnn Lee
Donghwan Lee