The Bayesian approach to learning starts by choosing a prior probability distribution over the unknown parameters of the world. Then, as the learner makes observation, the prior is updated using Bayes rule to form the posterior, which represents the new Continue Reading

The variance of Exp3

In an earlier post we analyzed an algorithm called Exp3 for $k$-armed adversarial bandits for which the expected regret is bounded by \begin{align*} R_n = \max_{a \in [k]} \E\left[\sum_{t=1}^n y_{tA_t} – y_{ta}\right] \leq \sqrt{2n k \log(k)}\,. \end{align*} The setting of Continue Reading

First order bounds for k-armed adversarial bandits

To revive the content on this blog a little we have decided to highlight some of the new topics covered in the book that we are excited about and that were not previously covered in the blog. In this post Continue Reading

Adversarial linear bandits and the curious case of the unit ball

According to the main result of the previous post, given any finite action set $\cA$ with $K$ actions $a_1,\dots,a_K\in \R^d$, no matter how an adversary selects the loss vectors $y_1,\dots,y_n\in \R^d$, as long as the action losses $\ip{a_k,y_t}$ are in Continue Reading