## The variance of Exp3

In an earlier post we analyzed an algorithm called Exp3 for $k$-armed adversarial bandits for which the expected regret is bounded by \begin{align*} R_n = \max_{a \in [k]} \E\left[\sum_{t=1}^n y_{tA_t} – y_{ta}\right] \leq \sqrt{2n k \log(k)}\,. \end{align*} The setting of Continue Reading

## Adversarial linear bandits and the curious case of the unit ball

According to the main result of the previous post, given any finite action set $\cA$ with $K$ actions $a_1,\dots,a_K\in \R^d$, no matter how an adversary selects the loss vectors $y_1,\dots,y_n\in \R^d$, as long as the action losses $\ip{a_k,y_t}$ are in Continue Reading

## Lower Bounds for Stochastic Linear Bandits

Lower bounds for linear bandits turn out to be more nuanced than the finite-armed case. The big difference is that for linear bandits the shape of the action-set plays a role in the form of the regret, not just the Continue Reading

Continuing the previous post, we prove the claimed minimax lower bound. We start with a useful result that quantifies the difficulty of identifying whether or not an observation is drawn from similar distributions $P$ and $Q$ defined over the same Continue Reading