Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

doi:doi:10.57702/696ofodd

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an inﬁnite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Washim Uddin Mondal, Vaneet Aggarwal (2024). Dataset: Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward. https://doi.org/10.57702/696ofodd

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2305.02527
Author	Washim Uddin Mondal
More Authors	Vaneet Aggarwal
Homepage	https://arxiv.org/abs/2303.13604