However, after the correction, there still exists a flaw in the final distribution-free bound for UCB. This bound also requires the suboptimality gaps be bounded, right?

]]>Thanks for pointing out the bugs!

Tor

]]>Also, in the UCB regret (3), should the last constant be ‘1’, but not ‘3’?

Thank you!

]]>After equation (7) “An easy way to avoid numerical instability is to incrementally calculate $$\tilde S_{ti}=\hat S_{ti}–min_j \hat S_{ti}$$…” – should it not be $$\tilde S_{ti}=\hat S_{ti}–min_j \hat S_{tj}$$?

]]>Unless I am missing something, in the proof of the sparse case, the line that says “By the pigeonhole principle we can choose an S of size p/2 such that sum_{t=1}^n sum_{i in S} E[1{A_ti = 1}] \leq np/d” is incorrect.

Counterexample: Consider the simple strategy that just picks a uniform random action at each time, from among the set of all p/2-sparse binary vectors. Then, E[A_{t,i}] = p/2d, so adding up over time and any size-p/2 subset of coordinates, we get np^2/4d, which is much larger than the claimed np/d.

]]>However, there are simply too many books on measure theory itself. However the above should give you the idea. Actually, I think out of these, Pollard’s book stands out because of its smooth conversational style, while the book is still precise. ]]>

https://www.amazon.com/Stochastic-Processes-J-L-Doob/dp/0471523690

http://www.cambridge.org/gg/academic/subjects/statistics-probability/probability-theory-and-stochastic-processes/users-guide-measure-theoretic-probability?format=PB