However, I can’t understand that “Now, when A is run on E and i_E is chosen fewer than n(1−1/K) times on expectation then it will have more than Δn(1−1/K)≈c√K√n regret”. why more than? ]]>

I have a small question. I think it might be better to say that the value of 1-δ is called the confidence level (instead of saying δ is the confidence level). δ is sort of like the upper bound on the probability of error that we allow. Also, using 1-δ will possibly make it more consistent with confidence interval terminology used in statistics. Please correct me if I am wrong. Thanks! 🙂

]]>In the first case ($UCB_1$ is low) the left hand side of the inequality is the index of arm 1, right? Shall we not use $T_1(t-1)$ in the denominator of the square root?

]]>The high level argument is as follows: The number of observations for this particular arm $i_{E’}$ is at most $n/K$. So the sample mean for this arm will have a standard deviation of $1/\sqrt{n/K}$. If some other arm’s mean is closer than the standard deviation then the two means cannot be told apart.

The inequality in the previous paragraph is making this more exact.

I hope this answers your question.

Cheers,

Csaba ]]>

Hi stuck on the paragraph below,

“Note that iE′ cannot be used more than ≈n/K times on expectation when A is run on E. … To make the two algorithms essentially indistinguishable, set Δ=1/sqrt(n/K).”

I was wondering how to decide the value of Δ in order to have indistinguishable environments and its relation to the aforementioned inequality based on multiple integration by parts.

Many thanks for any help

]]>