Lifelong Hyper-Policy Optimization with Multiple Importance Sampling

The authors propose a lifelong RL approach that learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time.

BibTex: