Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 10, Number 1 (2016), 242-270.
Randomized allocation with arm elimination in a bandit problem with covariates
Motivated by applications in personalized web services and clinical research, we consider a multi-armed bandit problem in a setting where the mean reward of each arm is associated with some covariates. A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in reward function modeling and a theoretical guarantee of a cumulative regret minimax rate. When the function smoothness parameter is unknown, the algorithm is equipped with a histogram estimation based smoothness parameter selector using Lepski’s method, and is shown to maintain the regret minimax rate up to a logarithmic factor under a “self-similarity” condition.
Electron. J. Statist., Volume 10, Number 1 (2016), 242-270.
Received: October 2014
First available in Project Euclid: 17 February 2016
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Qian, Wei; Yang, Yuhong. Randomized allocation with arm elimination in a bandit problem with covariates. Electron. J. Statist. 10 (2016), no. 1, 242--270. doi:10.1214/15-EJS1104. https://projecteuclid.org/euclid.ejs/1455715962