-
Policy Optimization for Low-rank MDPs (POLO)
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback -
State-wise Constrained Policy Optimization
State-wise Constrained Policy Optimization (SCPO) is a general-purpose policy search algorithm for state-wise constrained reinforcement learning. -
Policy Optimization for Stochastic Shortest Path
Policy optimization for stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better...