Off-Policy Deep Reinforcement Learning without Exploration

The dataset used in the paper is a batch of data collected from a fixed batch of data which has already been gathered, without offering further possibility for data collection.

BibTex: