Off-Policy Deep Reinforcement Learning without Exploration
The dataset used in the paper is a batch of data collected from a fixed batch of data which has already been gathered, without offering further possibility for data collection.
BibTex:
Before browse our site, please accept our cookies policy