Bandit Algorithms Book

Dear readers

After nearly two years since starting to write the blog we have at last completed a first draft of the book, which is to be published by Cambridge University Press.
The book is available for free as a PDF and will remain so after publication. We’re grateful to Cambridge for allowing this.

Without further ado, here is the link.

Although we still have a few things we want to do, the manuscript is sufficiently polished to be useful. Of course we would greatly appreciate any comments you might have, including typos, errors in the proofs, missing references, confusing explanations or anything else you might notice. We will periodically update the book, so it would be helpful if you could quote the revision number on the cover when sending us your comments (banditalgs@gmail.com).

The manuscript includes a lot of material not in the blog. The last seven chapters are all new, covering combinatorial (semi-)bandits, non-stationary bandits, ranking, pure exploration, Bayesian methods, Thompson sampling, partial monitoring and an introduction to learning in Markov decision processes. Those chapters that are based on blog posts have been cleaned up and often we have added significant depth. There is a lot of literature that we have not covered. Some of these missing topics are discussed in extreme brevity in the introduction to Part VII. It really is amazing how large the bandit literature has become and we’re sorry not to have found space for everything.

The book includes around 250 exercises, some of which have solutions. On average the exercises have been proofread less carefully than the rest of the book, so some caution is advised. The solutions to selected exercises are available here.

Finally, we’re very thankful for all the feedback already received, both on the blog and early drafts of the book.

Table of Contents

  1. Bandits: A new beginning
  2. Finite-armed stochastic bandits: Warming up
  3. First steps: Explore-then-Commit
  4. The Upper Confidence Bound (UCB) Algorithm
  5. Optimality concepts and information theory
  6. More information theory and minimax lower bounds
  7. Instance dependent lower bounds
  8. Adversarial bandits
  9. High probability lower bounds
  10. Contextual bandits, prediction with expert advice and Exp4
  11. Stochastic Linear Bandits and UCB
  12. Ellipsoidal confidence sets for least-squares estimators
  13. Sparse linear bandits
  14. Lower bounds for stochastic linear bandits
  15. Adversarial linear bandits
  16. Adversarial linear bandits and the curious case of linear bandits on the unit ball

Bandit tutorial slides and update on book

This website has been quiet for some time, but we have not given up on bandits just yet.

First up, we recently gave a short tutorial at AAAI that covered the basics of finite-armed stochastic bandits and stochastic linear bandits. The slides are now available (part1, part2).

We mentioned at some point that these notes would become a book. We’re happy to say this project is coming close to completion. We hope in the next month or so to publish a reasonable quality draft.

Of course the book contains all the content in the blog in a polished and extended form. There are also many new chapters. Some highlights are: combinatorial bandits, non-stationary bandits, ranking, Bayesian methods (including Thompson sampling) and pure exploration. We also have two chapters that peek beyond the world of bandits at partial monitoring and learning in Markov decision processes.

Once the draft is complete we would love to have your feedback.