Csaba Szepesvari

Adversarial linear bandits and the curious case of the unit ball

Posted onNovember 25, 2016March 17, 2019Leave a comment

According to the main result of the previous post, given any finite action set $\cA$ with $K$ actions $a_1,\dots,a_K\in \R^d$, no matter how an adversary selects the loss vectors $y_1,\dots,y_n\in \R^d$, as long as the action losses $\ip{a_k,y_t}$ are in Continue Reading

Adversarial linear bandits

Posted onNovember 24, 2016March 17, 20191 Comment

In the next few posts we will consider adversarial linear bandits, which, up to a crude first approximation, can be thought of as the adversarial version of stochastic linear bandits. The discussion of the exact nature of the relationship between Continue Reading

Sparse linear bandits

Posted onNovember 21, 20161 Comment

In the last two posts we considered stochastic linear bandits, when the actions are vectors in the $d$-dimensional Euclidean space. According to our previous calculations, under the condition that the expected reward of all the actions are in a fixed Continue Reading

Stochastic Linear Bandits and UCB

Posted onOctober 19, 2016March 17, 201918 Comments

Recall that in the adversarial contextual $K$-action bandit problem, at the beginning of each round $t$ a context $c_t\in \Ctx$ is observed. The idea is that the context $c_t$ may help the learner to choose a better action. This led Continue Reading

Contextual Bandits and the Exp4 Algorithm

Posted onOctober 14, 20169 Comments

In most bandit problems there is likely to be some additional information available at the beginning of rounds and often this information can potentially help with the action choices. For example, in a web article recommendation system, where the goal Continue Reading

Adversarial bandits

Posted onOctober 1, 2016October 16, 20197 Comments

A stochastic bandit with $K$ actions is completely determined by the distributions of rewards, $P_1,\dots,P_K$, of the respective actions. In particular, in round $t$, the distribution of the reward $X_t$ received by a learner choosing action $A_t\in [K]$ is $P_{A_t}$, Continue Reading

More information theory and minimax lower bounds

Posted onSeptember 28, 201617 Comments

Continuing the previous post, we prove the claimed minimax lower bound. We start with a useful result that quantifies the difficulty of identifying whether or not an observation is drawn from similar distributions $P$ and $Q$ defined over the same Continue Reading

Finite-armed stochastic bandits: Warming up

Posted onSeptember 4, 201619 Comments

On Monday last week we did not have a lecture, so the lectures spilled over to this week’s Monday. This week was devoted to building up foundations, and this post will summarize how far we got. The post is pretty Continue Reading

Bandits: A new beginning

Posted onSeptember 4, 201617 Comments

Dear Interested Reader, Together with Tor, we have worked a lot on bandit problems in the past and developed a true passion for them. At the pressure of some friends and students (and a potential publisher), and also just to Continue Reading

Posted onAugust 1, 2016March 28, 20193 Comments

Bandits: A new beginning Finite-armed stochastic bandits: Warming up First steps: Explore-then-Commit The Upper Confidence Bound (UCB) Algorithm Optimality concepts and information theory More information theory and minimax lower bounds Instance dependent lower bounds Adversarial bandits High probability lower bounds Continue Reading

Author: Csaba Szepesvari