Lifelong Hyper-Policy Optimization with Multiple Importance Sampling
The authors propose a lifelong RL approach that learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time.
BibTex:
Before browse our site, please accept our cookies policy