Policy Optimization for Low-rank MDPs (POLO)

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

BibTex: