October 2016 – Bandit Algorithms

Lower Bounds for Stochastic Linear Bandits

Posted onOctober 20, 2016March 17, 20192 Comments

Lower bounds for linear bandits turn out to be more nuanced than the finite-armed case. The big difference is that for linear bandits the shape of the action-set plays a role in the form of the regret, not just the Continue Reading

Stochastic Linear Bandits and UCB

Posted onOctober 19, 2016March 17, 201918 Comments

Recall that in the adversarial contextual $K$-action bandit problem, at the beginning of each round $t$ a context $c_t\in \Ctx$ is observed. The idea is that the context $c_t$ may help the learner to choose a better action. This led Continue Reading

Contextual Bandits and the Exp4 Algorithm

Posted onOctober 14, 20169 Comments

In most bandit problems there is likely to be some additional information available at the beginning of rounds and often this information can potentially help with the action choices. For example, in a web article recommendation system, where the goal Continue Reading

High probability lower bounds

Posted onOctober 14, 20161 Comment

In the post on adversarial bandits we proved two high probability upper bounds on the regret of Exp-IX. Specifically, we showed: Theorem: There exists a policy $\pi$ such that for all $\delta \in (0,1)$ for any adversarial environment $\nu\in [0,1]^{nK}$, Continue Reading

Adversarial bandits

Posted onOctober 1, 2016October 16, 20197 Comments

A stochastic bandit with $K$ actions is completely determined by the distributions of rewards, $P_1,\dots,P_K$, of the respective actions. In particular, in round $t$, the distribution of the reward $X_t$ received by a learner choosing action $A_t\in [K]$ is $P_{A_t}$, Continue Reading