Open Access
2017 Linear Thompson sampling revisited
Marc Abeille, Alessandro Lazaric
Electron. J. Statist. 11(2): 5165-5197 (2017). DOI: 10.1214/17-EJS1341SI

Abstract

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on the functioning of the TS. We leverage the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

Citation

Download Citation

Marc Abeille. Alessandro Lazaric. "Linear Thompson sampling revisited." Electron. J. Statist. 11 (2) 5165 - 5197, 2017. https://doi.org/10.1214/17-EJS1341SI

Information

Received: 1 June 2017; Published: 2017
First available in Project Euclid: 15 December 2017

zbMATH: 06825043
MathSciNet: MR3738208
Digital Object Identifier: 10.1214/17-EJS1341SI

Keywords: Linear bandit , Thompson sampling

Vol.11 • No. 2 • 2017
Back to Top