Lipschitz Bandits

The dataset used in the paper is a Lipschitz bandit problem, where the learner aims at selecting satisficing arms (arms with mean reward exceeding a certain threshold value) as frequently as possible.

BibTex: