RANDPOL

The dataset used in the paper is a Continuous MDP with continuous state and action spaces.

BibTex: