Open Access
October 2017 Reinforcement learning from comparisons: Three alternatives are enough, two are not
Benoît Laslier, Jean-François Laslier
Ann. Appl. Probab. 27(5): 2907-2925 (October 2017). DOI: 10.1214/16-AAP1271

Abstract

This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.

Citation

Download Citation

Benoît Laslier. Jean-François Laslier. "Reinforcement learning from comparisons: Three alternatives are enough, two are not." Ann. Appl. Probab. 27 (5) 2907 - 2925, October 2017. https://doi.org/10.1214/16-AAP1271

Information

Received: 1 June 2016; Revised: 1 October 2016; Published: October 2017
First available in Project Euclid: 3 November 2017

zbMATH: 1379.60081
MathSciNet: MR3719949
Digital Object Identifier: 10.1214/16-AAP1271

Subjects:
Primary: 60J20 , 91A22 , 91E40

Keywords: learning , tournament , urn process

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.27 • No. 5 • October 2017
Back to Top