Adversarial bandits

A stochastic bandit with $K$ actions is completely determined by the distributions of rewards, $P_1,\dots,P_K$, of the respective actions. In particular, in round $t$, the distribution of the reward $X_t$ received by a learner choosing action $A_t\in [K]$ is $P_{A_t}$, Continue Reading

Table of Contents

Bandits: A new beginning Finite-armed stochastic bandits: Warming up First steps: Explore-then-Commit The Upper Confidence Bound (UCB) Algorithm Optimality concepts and information theory More information theory and minimax lower bounds Instance dependent lower bounds Adversarial bandits High probability lower bounds Continue Reading