In the first case ($UCB_1$ is low) the left hand side of the inequality is the index of arm 1, right? Shall we not use $T_1(t-1)$ in the denominator of the square root?

]]>The high level argument is as follows: The number of observations for this particular arm $i_{E’}$ is at most $n/K$. So the sample mean for this arm will have a standard deviation of $1/\sqrt{n/K}$. If some other arm’s mean is closer than the standard deviation then the two means cannot be told apart.

The inequality in the previous paragraph is making this more exact.

I hope this answers your question.

Cheers,

Csaba ]]>

Hi stuck on the paragraph below,

“Note that iE′ cannot be used more than ≈n/K times on expectation when A is run on E. … To make the two algorithms essentially indistinguishable, set Δ=1/sqrt(n/K).”

I was wondering how to decide the value of Δ in order to have indistinguishable environments and its relation to the aforementioned inequality based on multiple integration by parts.

Many thanks for any help

]]>Thanks for the comment. You are right: $n$ should have been $t$ here. Oh, and I edited the page to reflect this.

– Csaba ]]>