The Annals of Statistics

Empirical Distributions in Selection Bias Models

Y. Vardi

Full-text: Open access

Abstract

The following problem is treated: Given $s$ not-necessarily-random samples from an unknown distribution $F$, and assuming that we know the sampling rule of each sample, is it possible to combine the samples in order to estimate $F$, and if so what is the natural way of doing it? More formally, this translates to the problem of determining whether there exists a nonparametric maximum likelihood estimate (NPMLE) of $F$ on the basis of $s$ samples from weighted versions of $F$, with known weight functions, and if it exists, how to construct it? We give a simple necessary and sufficient condition, which can be checked graphically, for the existence and uniqueness of the NPMLE and, under this condition, we describe a simple method for constructing it. The method is numerically efficient and mathematically interesting because it reduces the problem to one of solving $s - 1$ nonlinear equations with $s - 1$ unknowns, the unique solution of which is easily obtained by the iterative, Gauss-Seidel type, scheme described in the paper. Extensions for the case where the weight functions are not completely specified and for censored samples, applications, numerical examples, and statistical properties of the NPMLE, are discussed. In particular, we prove under this condition that the NPMLE is a sufficient statistic for $F$. The technique has many potential applications, because it is not limited to the case where the sampled items are univariate. A FORTRAN program for the described algorithm is available from the author.

Article information

Source
Ann. Statist., Volume 13, Number 1 (1985), 178-203.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176346585

Digital Object Identifier
doi:10.1214/aos/1176346585

Mathematical Reviews number (MathSciNet)
MR773161

Zentralblatt MATH identifier
0578.62047

JSTOR
links.jstor.org

Subjects
Primary: 62G05: Estimation
Secondary: 62E99: None of the above, but in this section 62M99: None of the above, but in this section 62P10: Applications to biology and medical sciences

Keywords
Nonparametric maximum likelihood sample selection bias weighted distributions

Citation

Vardi, Y. Empirical Distributions in Selection Bias Models. Ann. Statist. 13 (1985), no. 1, 178--203. doi:10.1214/aos/1176346585. https://projecteuclid.org/euclid.aos/1176346585


Export citation