-
Linear Quadratic Regulator (LQR)
The Linear Quadratic Regulator (LQR) dataset is used to study the sample complexity of model-based and model-free algorithms for policy evaluation and policy optimization. -
Policy Optimization for Stochastic Shortest Path
Policy optimization for stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better...