Policy Optimization for Low-rank MDPs (POLO)

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Data and Resources

Cite this as

Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li (2024). Dataset: Policy Optimization for Low-rank MDPs (POLO). https://doi.org/10.57702/p5oat7zd

DOI retrieved: December 17, 2024

Additional Info

Field Value
Created December 17, 2024
Last update December 17, 2024
Defined In https://doi.org/10.48550/arXiv.2311.07876
Author Canzhe Zhao
More Authors
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li