The dataset used in the paper is a MuJoCo environment, with 13-states and 4-control inputs, nonlinear dynamics with polynomial dependency in the control inputs.
The dataset used in the paper is RLBench, a standard benchmark for vision-based robotics which has been shown to serve as a proxy for real-robot experiments.