Open Access
October 1997 Bandit problems with infinitely many arms
Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, Larry A. Shepp
Ann. Statist. 25(5): 2103-2116 (October 1997). DOI: 10.1214/aos/1069362389

Abstract

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with $n \to \infty$. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between $\sqrt{2}/\sqrt{n}$ and $2/\sqrt{n}$ and we exhibit classes of strategies that achieve the latter.

Citation

Download Citation

Donald A. Berry. Robert W. Chen. Alan Zame. David C. Heath. Larry A. Shepp. "Bandit problems with infinitely many arms." Ann. Statist. 25 (5) 2103 - 2116, October 1997. https://doi.org/10.1214/aos/1069362389

Information

Published: October 1997
First available in Project Euclid: 20 November 2003

zbMATH: 0881.62083
MathSciNet: MR1474085
Digital Object Identifier: 10.1214/aos/1069362389

Subjects:
Primary: 60F99 , 62C25 , 62L05

Keywords: bandit problems , dynamic allocation of Bernoulli processes , Sequential experimentation , staying with a winner , switching with a loser

Rights: Copyright © 1997 Institute of Mathematical Statistics

Vol.25 • No. 5 • October 1997
Back to Top