In the book chapter about stochastic linear bandits, there is a remark saying that the linear bandit analysis yields the minmax $$\sqrt{dT \log( \cdot )}$$ bound for UCB.

However, I only manage to get a factor $d$ from $\beta_t$, AND a factor $d$ from the

$$

\sum_{i = 1}^d \sum_{t = 1}^T \frac_{e_{i,t}}{N_i(t)},

$$

yielding a total $$d \sqrt{T \dots }$$ bound.

What simplification am I missing ? ]]>

Sorry not sure how to write math in comment. ]]>

As far as I understand this post assumes that theta_star is a single parameter vector for all arms, where in disjoint LinUCB “A Contextual-Bandit Approach to Personalized News Article Recommendation (http://rob.schapire.net/papers/www10.pdf ), theta_star is unique for each arm.

Is that correct? Would the later algorithm have bigger regret bound? ]]>

Or are they meant to be read standalone?

]]>