Dataset Groups Activity Stream Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. BibTex: @dataset{Washim_Uddin_Mondal_and_Vaneet_Aggarwal_2024, abstract = {We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.}, author = {Washim Uddin Mondal and Vaneet Aggarwal}, doi = {10.57702/696ofodd}, institution = {No Organization}, keyword = {'MDP', 'composite', 'delayed', 'partially anonymous', 'reward'}, month = {dec}, publisher = {TIB}, title = {Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward}, url = {https://service.tib.eu/ldmservice/dataset/reinforcement-learning-with-delayed--composite--and-partially-anonymous-reward}, year = {2024} }