Dataset - LDM

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling

The authors propose a lifelong RL approach that learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

Before browse our site, please accept our cookies policy