Abstract
Two independent Bernoulli processes (arms) have unknown success probabilities $\rho$ and $\lambda$. The initial (a priori) information about $\rho$ and $\lambda$ is expressed by probability distributions $dR(\rho) = C_R \rho{^r_0}(1 - \rho)^{r_0'} d\mu(\rho) \text{for the right arm},$ and $dL(\lambda) = C_L \lambda^{l_0}(1 - \lambda)^{l_0'} d\mu(\lambda) \text{for the left arm},$ where $\mu$ is any arbitrary measure on the unit interval. A specified number $n$ of observations is made sequentially, the arm selected at each stage depending on the previous observations and the initial information. A conjecture of Berry states that if the initial information present about the right arm (given by $r_0 + r_0'$) is not greater than that present for the left arm $(l_0 + l_0')$ and the initial expected value of $\rho$ is not less than that of $\lambda$, then for any $n$ the advantage (in terms of expected number of successes) of taking the first observation on the right arm is never less than that for the left arm. A proof of this conjecture is given in this paper.
Citation
V. M. Joshi. "A Conjecture of Berry Regarding A Bernoulli Two-Armed Bandit." Ann. Statist. 3 (1) 189 - 202, January, 1975. https://doi.org/10.1214/aos/1176343007
Information