Finite-Armed Bandits

The dataset used in the paper is a finite-armed bandit problem, where the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshold value) as frequently as possible.

BibTex: