Policy Optimization for Low-rank MDPs (POLO)

doi:doi:10.57702/p5oat7zd

Policy Optimization for Low-rank MDPs (POLO)

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li (2024). Dataset: Policy Optimization for Low-rank MDPs (POLO). https://doi.org/10.57702/p5oat7zd

DOI retrieved: December 17, 2024

Additional Info

Field	Value
Created	December 17, 2024
Last update	December 17, 2024
Defined In	https://doi.org/10.48550/arXiv.2311.07876
Author	Canzhe Zhao
More Authors	Ruofeng Yang Baoxiang Wang Xuezhou Zhang Shuai Li