Open Access
2008 A penalized bandit algorithm
Damien Lamberton, Gilles Pagès
Author Affiliations +
Electron. J. Probab. 13: 341-373 (2008). DOI: 10.1214/EJP.v13-489


We study a two armed-bandit recursive algorithm with penalty. We show that the algorithm converges towards its ``target" although it always has a noiseless ``trap". Then, we elucidate the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a Markov process with jumps.


Download Citation

Damien Lamberton. Gilles Pagès. "A penalized bandit algorithm." Electron. J. Probab. 13 341 - 373, 2008.


Accepted: 10 March 2008; Published: 2008
First available in Project Euclid: 1 June 2016

zbMATH: 1206.62139
MathSciNet: MR2386736
Digital Object Identifier: 10.1214/EJP.v13-489

Primary: 62L20
Secondary: 68T05 , 91B32 , 91E40 , 93C40

Keywords: convergence rate , learning , Penalization , stochastic approximation , Two-armed bandit algorithm

Vol.13 • 2008
Back to Top