Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Tags: composite Filter Results Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. Dataset JSON