The Annals of Applied Probability

Levy Bandits: Multi-Armed Bandits Driven by Levy Processes

Haya Kaspi and Avi Mandelbaum

Full-text: Open access


Levy bandits are multi-armed bandits driven by Levy processes. As anticipated from existing research, Levy bandits are optimally controlled by an index strategy: One can associate with each arm an index function of its state, and optimal strategies are those that allocate time to arms whose states have the largest index. Furthermore, the index function of an arm is calculated independently of the other arms, and the optimal reward can be expressed in terms of the indices. Somewhat less anticipated, however, is the fact that the index function of an arm, driven by a Levy process, has a representation in terms of the decreasing ladder sets and the exit system of its Levy driver. Moreover, the Wiener-Hopf factorization of the Levy exponents of an arm can be used to obtain the characteristic function of some excursion law, through which the index of the arm is defined. We use this factorization to calculate explicitly index functions and optimal rewards of some interesting Levy bandits, rediscovering along the way that local time naturally quantifies switching in continuous time.

Article information

Ann. Appl. Probab., Volume 5, Number 2 (1995), 541-565.

First available in Project Euclid: 19 April 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier


Primary: 60J30
Secondary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60] 60J55: Local time and additive functionals

Levy processes excursions local time Wiener-Hopf factorization multiparameter processes multiarmed bandits optional increasing path


Kaspi, Haya; Mandelbaum, Avi. Levy Bandits: Multi-Armed Bandits Driven by Levy Processes. Ann. Appl. Probab. 5 (1995), no. 2, 541--565. doi:10.1214/aoap/1177004777.

Export citation