## Bayesian/minimax duality for adversarial bandits

The Bayesian approach to learning starts by choosing a prior probability distribution over the unknown parameters of the world. Then, as the learner makes observation, the prior is updated using Bayes rule to form the posterior, which represents the new Continue Reading

## The variance of Exp3

In an earlier post we analyzed an algorithm called Exp3 for $k$-armed adversarial bandits for which the expected regret is bounded by \begin{align*} R_n = \max_{a \in [k]} \E\left[\sum_{t=1}^n y_{tA_t} – y_{ta}\right] \leq \sqrt{2n k \log(k)}\,. \end{align*} The setting of Continue Reading

## First order bounds for k-armed adversarial bandits

To revive the content on this blog a little we have decided to highlight some of the new topics covered in the book that we are excited about and that were not previously covered in the blog. In this post Continue Reading

## Bandit Algorithms Book

Dear readers After nearly two years since starting to write the blog we have at last completed a first draft of the book, which is to be published by Cambridge University Press. The book is available for free as a Continue Reading

## Bandit tutorial slides and update on book

This website has been quiet for some time, but we have not given up on bandits just yet. First up, we recently gave a short tutorial at AAAI that covered the basics of finite-armed stochastic bandits and stochastic linear bandits. Continue Reading

## Ellipsoidal Confidence Sets for Least-Squares Estimators

Continuing the previous post, here we give a construction for confidence bounds based on ellipsoidal confidence sets. We also put things together and show bound on the regret of the UCB strategy that uses the constructed confidence bounds. Constructing the Continue Reading

## Lower Bounds for Stochastic Linear Bandits

Lower bounds for linear bandits turn out to be more nuanced than the finite-armed case. The big difference is that for linear bandits the shape of the action-set plays a role in the form of the regret, not just the Continue Reading

## High probability lower bounds

In the post on adversarial bandits we proved two high probability upper bounds on the regret of Exp-IX. Specifically, we showed: Theorem: There exists a policy $\pi$ such that for all $\delta \in (0,1)$ for any adversarial environment $\nu\in [0,1]^{nK}$, Continue Reading

## Optimality concepts and information theory

In this post we introduce the concept of minimax regret and reformulate our previous result on the upper bound on the worst-case regret of UCB in terms of the minimax regret. We briefly discuss the strengths and weaknesses of using Continue Reading