Abstract
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with $n \to \infty$. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between $\sqrt{2}/\sqrt{n}$ and $2/\sqrt{n}$ and we exhibit classes of strategies that achieve the latter.
Citation
Donald A. Berry. Robert W. Chen. Alan Zame. David C. Heath. Larry A. Shepp. "Bandit problems with infinitely many arms." Ann. Statist. 25 (5) 2103 - 2116, October 1997. https://doi.org/10.1214/aos/1069362389
Information