Open Access
December, 1985 Bayesian Nonparametric Bandits
Murray K. Clayton, Donald A. Berry
Ann. Statist. 13(4): 1523-1534 (December, 1985). DOI: 10.1214/aos/1176349753

Abstract

Sequential selections are to be made from two stochastic processes, or "arms." At each stage the arm selected for observation depends on past observations. The objective is to maximize the expected sum of the first $n$ observations. For arm 1 the observations are identically distributed with probability measure $P$, and for arm 2 the observations have probability measure $Q; P$ is a Dirichlet process and $Q$ is known. An equivalent problem is deciding sequentially when to stop sampling from an unknown population. Optimal strategies are shown to continue sampling if the current observation is sufficiently large. A simple form of such a rule is expressed in terms of a degenerate Dirichlet process which is related to $P$.

Citation

Download Citation

Murray K. Clayton. Donald A. Berry. "Bayesian Nonparametric Bandits." Ann. Statist. 13 (4) 1523 - 1534, December, 1985. https://doi.org/10.1214/aos/1176349753

Information

Published: December, 1985
First available in Project Euclid: 12 April 2007

zbMATH: 0587.62151
MathSciNet: MR811507
Digital Object Identifier: 10.1214/aos/1176349753

Subjects:
Primary: 62L05
Secondary: 62L15

Keywords: Dirichlet bandits , nonparametric decisions , one-armed bandits , Optimal stopping , sequential decisions , two-armed bandits

Rights: Copyright © 1985 Institute of Mathematical Statistics

Vol.13 • No. 4 • December, 1985
Back to Top