RL Boosting via Weak Supervised Learning

The dataset used in the paper is a reinforcement learning dataset, where the goal is to learn a policy that maximizes the expected return in a Markov decision process.

BibTex: