The Annals of Mathematical Statistics

On an A.P.O. Rule in Sequential Estimation with Quadratic Loss

Peter J. Bickel and Joseph A. Yahav

Full-text: Open access


Consider the problem of Bayesian sequential estimation of a real parameter $\theta$ with quadratic loss and fixed cost $c$ per observation. It is well known (cf. [1], [2]) that, under simple regularity conditions, this problem reduces to the following one. If $Z_1, Z_2, \cdots, Z_n, \cdots$ are the observations (independent and identically distributed given $\theta$) let \begin{equation*}\tag{1.1}Y_n = \operatorname{Var} (\theta\mid Z_1, \cdots, Z_n),\end{equation*} the posterior variance of $\theta$, and, \begin{equation*}\tag{1.2}X_n(c) = Y_n + nc.\end{equation*} The problem is then to find a stopping time $s(c)$ such that $E(X_{s(c)}(c)) = \inf \{E(X_t(c)):t \varepsilon T\}$ where $T$ is the set of all stopping times. In general, although $s(c)$ can usually be shown to exist finding it in explicit form is difficult. In [2] we proposed the following stopping time $\tilde{t}(c)$ for this problem: "Stop as soon as $Y_n \leqq c(n + 1)$". We showed in [2] (generalized in [3]) that under some regularity conditions this rule is asymptotically pointwise optimal (A.P.O.) i.e., \begin{equation*}\tag{1.3}\lim_{c\rightarrow 0} X_{\tilde{t}(c)}(c)\lbrack X(c)\rbrack^{-1} = 1\operatorname{a.s.} \end{equation*} where, \begin{equation*}\tag{1.4}X(c) = \inf_nX_n(c).\end{equation*} In fact, we proved that, \begin{equation*}\tag{1.5}X_{\tilde{t}(c)}(c) = 2c^{\frac{1}{2}}V^{\frac{1}{2}}(\theta) + o(c^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} and, \begin{equation*}\tag{1.6}X_{\tilde{t}(c)}(c) - X(c) = o(c^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} where $V(\theta)$ is the reciprocal of the Fisher information number. Later, (in [3]) we showed, under some additional conditions that $\tilde{t}(c)$ is asymptotically optimal i.e., that, \begin{equation*}\tag{1.7}\lim_{c\rightarrow 0}\lbrack E(X_{s(c)}(c))\rbrack\lbrack E(X_{\tilde{t}(c)}(c))\rbrack^{-1} = 1,\end{equation*} and in fact, that \begin{equation*}\tag{1.8}E(X(c)) = 2c^{\frac{1}{2}}E(V(\theta)) + o(c^{\frac{1}{2}})\end{equation*} and \begin{equation*}\tag{1.9}E(X_{\tilde{t}(c)}(c)) - E(X(c)) = o(c^{\frac{1}{2}}).\end{equation*} In this paper we seek to refine the term $o(c^{\frac{1}{2}})$ in (1.5)-(1.6) and (1.8)-(1.9). Our analysis, as in our previous work, is based on looking at the asymptotic properties of $Y_n$. We showed in [2] and [4] that, \begin{equation*}\tag{1.10}Y_n = V(\theta) n^{-1} + R_n\end{equation*} where $R_n = o(n^{-1})$ a.s. In [4] we further showed that, under suitable conditions, \begin{equation*}\tag{1.11}Y_n = V(\theta)n^{-1} + S_n(\theta)n^{-2} + R'_n\end{equation*} a.s. where $R'_n = o(n^{-3/2})$ and \begin{equation*}\tag{1.12}S_n(\theta) = \sum^n_{i=1} W_i(\theta)\end{equation*} where the $W _i$ are independent and identically distributed with mean $0$ given $\theta$. If $W_1(\theta)$ has a second moment and is non degenerate the law of the iterated logarithm enables us to conclude that, \begin{equation*}\tag{1.13}R_n = O(n^{-3/2}\lbrack\log\log n\rbrack^{\frac{1}{2}}) \operatorname{a.s.}\end{equation*} This suggests Theorem 2.1 which asserts that if (1.13) holds then, \begin{equation*}\tag{1.14} X_{\tilde{t}(c)}(c) - X(c) = 0(c^{3/4-\epsilon})\end{equation*} a.s. for all $\epsilon > 0$. The analogues of (1.8) and (1.9) pose greater difficulty. In Section 3 we show that (Theorems 3.1, 3.2), \begin{equation*}\tag{1.15}E(X(c) - 2\lbrack V(\theta)c\rbrack^{\frac{1}{2}}) = \max (o(c^{\frac{1}{2}+\delta(\lambda, b)-\epsilon}), O(c)),\end{equation*} for every $\epsilon > 0$ where, \begin{equation*}\tag{1.16}\delta(\lambda, b) = \frac{1}{2}(\lambda - 1)b(b + (\lambda - 1))^{-1}\end{equation*} and $b$ and $\lambda$ depend on the problem. (Typically $\lambda = \frac{3}{2}$.) On the other hand, in Section 4 we establish, (Theorem 4.1), \begin{equation*}\tag{1.17}E(X_{\tilde{t}(c)}(c) - 2\lbrack V(\theta)c\rbrack^{\frac{1}{2}})^+ = \max (O(c^{\lambda/2}), O(c)),\end{equation*} for every $\epsilon > 0$ where again typically $\lambda = \frac{3}{2}$. Finally in Section 5 we apply our general results to two special situations. (i) Estimating the mean of a normal distribution with a normal prior. (ii) Estimating $p$ on the basis of binomial trials with a beta prior. In case (i) our conditions yields $O(c)$ in both (1.15) and (1.17) and this is best possible. In (ii) when for instance we have a uniform prior the best $\lambda = \frac{3}{2}$ and the best $b = 1$ and we therefore get $o(c^{-3/4\epsilon})$ for every $\epsilon > 0$ in (1.15) and $o(c^{2/3-\epsilon})$ for every $\epsilon > 0$ in (1.17). We do not believe these are best possible. A further analysis of (1.11) would seem to be required for anything better.

Article information

Ann. Math. Statist., Volume 40, Number 2 (1969), 417-426.

First available in Project Euclid: 27 April 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier



Bickel, Peter J.; Yahav, Joseph A. On an A.P.O. Rule in Sequential Estimation with Quadratic Loss. Ann. Math. Statist. 40 (1969), no. 2, 417--426. doi:10.1214/aoms/1177697706.

Export citation