Stochastic Linear Bandits and UCB
Recall that in the adversarial contextual $K$-action bandit problem, at the beginning of each round $t$ a context $c_t\in \Ctx$ is observed. The idea is that the context $c_t$ may help the learner to choose a better action. This led Continue Reading