We study a two armed-bandit recursive algorithm with penalty. We show that the algorithm converges towards its ``target" although it always has a noiseless ``trap". Then, we elucidate the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a Markov process with jumps.
Damien Lamberton. Gilles Pagès. "A penalized bandit algorithm." Electron. J. Probab. 13 341 - 373, 2008. https://doi.org/10.1214/EJP.v13-489