On the Theory of Reinforcement Learning

The dataset is used to study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.

BibTex: