Continuing the previous post, we prove the claimed minimax lower bound. We start with a useful result that quantifies the difficulty of identifying whether or not an observation is drawn from similar distributions $P$ and $Q$ defined over the same Continue Reading

## Optimality concepts and information theory

In this post we introduce the concept of minimax regret and reformulate our previous result on the upper bound on the worst-case regret of UCB in terms of the minimax regret. We briefly discuss the strengths and weaknesses of using Continue Reading

## The Upper Confidence Bound Algorithm

We now describe the celebrated Upper Confidence Bound (UCB) algorithm that overcomes all of the limitations of strategies based on exploration followed by commitment, including the need to know the horizon and sub-optimality gaps. The algorithm has many different forms, Continue Reading

## First steps: Explore-then-Commit

With most of the background material out of the way, we are almost ready to start designing algorithms for finite-armed stochastic bandits and analyzing their properties, and especially the regret. The last thing we need is an introduction to concentration Continue Reading

## Bandits: A new beginning

Dear Interested Reader, Together with Tor, we have worked a lot on bandit problems in the past and developed a true passion for them. At the pressure of some friends and students (and a potential publisher), and also just to Continue Reading