Could you elaborate more on how the orthogonality condition removes the sqrt(d) from beta?

]]>I also have a question about the relationship between LinUCB and the UCB for simple multi-armed bandits. (the algorithm in Chapter 19, 20 and the algorithm in Chapter 7-9)

I hope my understanding is correct: If the simple stochastic multi-armed bandits are re-formulated as linear bandits with orthogonal arms, LinUCB can be applied to yield a regret bound of $\sqrt{dn \log(\ldots)}$. However, in this particular case, I believe some constraint is imposed on the reward/regret of the arms. (Specifically, ”Assumption 19.1 (a)”). In some sense, the bound on the expected regret given by LinUCB still depends on the scale of the arms’ mean rewards, because we need to rescale an arbitrary multi-armed bandit to make it satisfy Assumption 19.1 (a).

The algorithm in Chapter 7-9, on the other hand, seems not to assume such a constraint. Its bound on expected regret is also $O(\sqrt{Kn\log(…)})$, but does not depend on the specific bandit. Of course this comes at a cost of introducing another term $Contant \cdot \sum \Delta_i$.

My question is: how to use LinUCB argument while still giving a bound which resembles that of Chapter 7-9?

]]>What is wrong with the definition? In the version of Thursday 27th June, 2019, the regret is defined as $R_n = \max_{\phi} \sum_{t=1}^m x_{t,\phi(o_t,\dots,o_{t-m})} – X_{t,A_t}$, which looks fine to me. Actually, the regret should be $\mathbb{E}[R_n] \leq \sqrt{2nk O^m \log(k)}$, so it was indeed incorrect in the exercise, but inside the $\log$ one should have $\log(k)$ and not $\log(O)$ (the number of $\phi$ maps is $k^{O^m}$). I’ll correct this in the next version of the book. Thanks for the catch.

Best,

Csaba

PS: Sorry for the late reply. ]]>

\[\mathbb{E}[R_n] \leq \sqrt{2nkO^m\log(O)}.\]

Please correct me if I am wrong. ]]>

In the book chapter about stochastic linear bandits, there is a remark saying that the linear bandit analysis yields the minmax $$\sqrt{dT \log( \cdot )}$$ bound for UCB.

However, I only manage to get a factor $d$ from $\beta_t$, AND a factor $d$ from the

$$

\sum_{i = 1}^d \sum_{t = 1}^T \frac{e_{i,t}}{N_i(t)},

$$

yielding a total $$d \sqrt{T \dots }$$ bound.

What simplification am I missing ? ]]>