Open Access
September, 1992 The Learning Component of Dynamic Allocation Indices
John Gittins, You-Gan Wang
Ann. Statist. 20(3): 1625-1636 (September, 1992). DOI: 10.1214/aos/1176348788

Abstract

For a multiarmed bandit problem with exponential discounting the optimal allocation rule is defined by a dynamic allocation index defined for each arm on its space. The index for an arm is equal to the expected immediate reward from the arm, with an upward adjustment reflecting any uncertainty about the prospects of obtaining rewards from the arm, and the possibilities of resolving those uncertainties by selecting that arm. Thus the learning component of the index is defined to be the difference between the index and the expected immediate reward. For two arms with the same expected immediate reward the learning component should be larger for the arm for which the reward rate is more uncertain. This is shown to be true for arms based on independent samples from a fixed distribution with an unknown parameter in the cases of Bernoulli and normal distributions, and similar results are obtained in other cases.

Citation

Download Citation

John Gittins. You-Gan Wang. "The Learning Component of Dynamic Allocation Indices." Ann. Statist. 20 (3) 1625 - 1636, September, 1992. https://doi.org/10.1214/aos/1176348788

Information

Published: September, 1992
First available in Project Euclid: 12 April 2007

zbMATH: 0760.62080
MathSciNet: MR1186269
Digital Object Identifier: 10.1214/aos/1176348788

Subjects:
Primary: 62C10
Secondary: 90C40 , 93E20

Keywords: Dynamic allocation index , Gittins index , Multiarmed bandit , target processes

Rights: Copyright © 1992 Institute of Mathematical Statistics

Vol.20 • No. 3 • September, 1992
Back to Top