The Annals of Applied Probability

Multi-armed bandits in discrete and continuous time

Haya Kaspi and Avishai Mandelbaum

Full-text: Open access

Abstract

We analyze Gittins' Markovian model, as generalized by Varaiya, Walrand and Buyukkoc, in discrete and continuous time. The approach resembles Weber's modification of Whittle's, within the framework of both multi-parameter processes and excursion theory. It is shown that index-priority strategies are optimal, in concert with all the special cases that have been treated previously.

Article information

Source
Ann. Appl. Probab., Volume 8, Number 4 (1998), 1270-1290.

Dates
First available in Project Euclid: 9 August 2002

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1028903380

Digital Object Identifier
doi:10.1214/aoap/1028903380

Mathematical Reviews number (MathSciNet)
MR1661180

Zentralblatt MATH identifier
0940.60063

Subjects
Primary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60]
Secondary: 60J55: Local time and additive functionals 60G44: Martingales with continuous parameter

Keywords
Multi-armed bandits optional increasing paths multiparameter processes excursions local times dual predictable projection

Citation

Kaspi, Haya; Mandelbaum, Avishai. Multi-armed bandits in discrete and continuous time. Ann. Appl. Probab. 8 (1998), no. 4, 1270--1290. doi:10.1214/aoap/1028903380. https://projecteuclid.org/euclid.aoap/1028903380


Export citation

References

  • [1] Az´ema, J. (1985). Sur les fermes aleatoires. S´eminaire de Probabilit´es XIX. Lecture Notes in Math. 1123 397-495. Springer, Berlin.
  • [2] Berry, D. A. and Fristedt, D. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London.
  • [3] Cairoli, R. and Dalang, R. C. (1996). Sequential stochastic optimization. In Probability and Statistics. Wiley, New York.
  • [4] Dellacherie, C. and Meyer, P. A. (1978). Probabilities and Potentials. North-Holland, Amsterdam.
  • [5] El Karoui, N. and Karatzas, I. (1993). General Gittins index processes in discrete time. Proc. Nat. Acad. Sci. U.S.A. 90 1232-1236.
  • [6] El Karoui, N. and Karatzas, I. (1994). Dy namic allocation problems in continuous time. Ann. Appl. Probab. 4 255-286.
  • [7] El Karoui, N. and Karatzas, I. (1996). Sy nchronization and optimality for multi-armed bandit problems in continuous time. Unpublished manuscript.
  • [8] Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. Wiley, New York.
  • [9] Gittins, J. C. and Jones, D. M. (1974). A dy namic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., eds.) 241-266. North-Holland, Amsterdam.
  • [10] Kaspi, H. and Maisonneurve, B. (1984/85). Predictable local times and exit sy stems. S´eminaire de Probabilit´es XX. Lecture Notes in Math. 1204 95-100. Springer, Berlin.
  • [11] Kaspi, H. and Mandelbaum, A. (1994). L´evy bandits: multi-armed bandits driven by L´evy processes. Ann. Appl. Probab. 5 541-565.
  • [12] Mandelbaum, A. (1986). Discrete multiarmed bandits and multiparameter processes. Probab. Theory Related Fields 71 129-147.
  • [13] Mandelbaum, A. (1987). Continuous multi-armed bandits and multi-parameter processes. Ann. Probab. 15 1527-1556.
  • [14] Mandelbaum, A. and Vanderbei, R. J. (1981). Optimal stopping and supermartingales over partially ordered sets. Z. Wahrsch. Verw. Gebiete 57 253-264.
  • [15] Presman, E. L. and Sonin, I. N. (1990). Sequential Control with Incomplete Information: The Bayesian Approach to Multi-armed Bandit Problems. Academic Press, New York.
  • [16] Sharpe, M. (1988). General Theory of Markov Processes. Academic Press, New York.
  • [17] Snell, L. (1952). Applications of martingale sy stems theorems. Trans. Amer. Math. Soc. 73 293-312.
  • [18] Varaiy a, P., Walrand, J. and Buy ukkoc, C. (1985). Extensions of the multi-armed bandit problem. The discounted case. IEEE Trans. Automat. Control AC-30 426-439.
  • [19] Walsh, J. B. (1981). Optional increasing paths. Colloque ENST-CNET: Lecture Notes in Math. 863 172-201. Springer, Berlin.
  • [20] Weber, R. (1992). On the Gittins index for multi-armed bandits. Ann. Appl. Probab. 2 1024- 1035.
  • [21] Whittle, P. (1980). Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. Ser. B 42 143-149.