Open Access
September, 1994 Two-Armed Dirichlet Bandits with Discounting
Manas K. Chattopadhyay
Ann. Statist. 22(3): 1212-1221 (September, 1994). DOI: 10.1214/aos/1176325626

Abstract

Sequential selections are to be made from two independent stochastic processes, or "arms." At each stage we choose which arm to observe based on past selections and observations. The observations on arm $i$ are conditionally i.i.d. given their marginal distribution $P_i$ which has a Dirichlet process prior with parameter $\alpha_i, i = 1, 2$. Future observations are discounted: at stage $m$, the payoff is $a_m$ times the observation $Z_m$ at that stage. The discount sequence $A_n = (a_1, a_2,\cdots, a_n, 0,0,\cdots)$ is a nonincreasing sequence of nonnegative numbers, where the "horizon" $n$ is finite. The objective is to maximize the total expected payoff $E(\sum^n_1a_iZ_i)$. It is shown that optimal strategies continue with an arm when it yields a sufficiently large observation, one larger than a "break-even observation." This generalizes results of Clayton and Berry, who considered two arms with one arm known and assumed $a_m = 1 \forall m \leq n$.

Citation

Download Citation

Manas K. Chattopadhyay. "Two-Armed Dirichlet Bandits with Discounting." Ann. Statist. 22 (3) 1212 - 1221, September, 1994. https://doi.org/10.1214/aos/1176325626

Information

Published: September, 1994
First available in Project Euclid: 11 April 2007

zbMATH: 0818.62067
MathSciNet: MR1311973
Digital Object Identifier: 10.1214/aos/1176325626

Subjects:
Primary: 62L05
Secondary: 62C10

Keywords: Dirichlet bandits , Dirichlet process prior , one-armed bandits , sequential decisions , two-armed bandits

Rights: Copyright © 1994 Institute of Mathematical Statistics

Vol.22 • No. 3 • September, 1994
Back to Top