The Annals of Statistics
- Ann. Statist.
- Volume 44, Number 2 (2016), 660-681.
Batched bandit problems
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Ann. Statist., Volume 44, Number 2 (2016), 660-681.
Received: May 2015
Revised: August 2015
First available in Project Euclid: 17 March 2016
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Perchet, Vianney; Rigollet, Philippe; Chassang, Sylvain; Snowberg, Erik. Batched bandit problems. Ann. Statist. 44 (2016), no. 2, 660--681. doi:10.1214/15-AOS1381. https://projecteuclid.org/euclid.aos/1458245731
- Supplement to “Batched bandit problems”. The supplementary material  contains additional simulations, including some using real data.