The Annals of Statistics

Large Sample Theory of Empirical Distributions in Biased Sampling Models

Richard D. Gill, Yehuda Vardi, and Jon A. Wellner

Full-text: Open access


Vardi (1985a) introduced an $s$-sample model for biased sampling, gave conditions which guarantee the existence and uniqueness of the nonparametric maximum likelihood estimator $\mathbb{G}_n$ of the common underlying distribution $G$ and discussed numerical methods for calculating the estimator. Here we examine the large sample behavior of the NPMLE $\mathbb{G}_n$, including results on uniform consistency of $\mathbb{G}_n$, convergence of $\sqrt n (\mathbb{G}_n - G)$ to a Gaussian process and asymptotic efficiency of $\mathbb{G}_n$ as an estimator of $G$. The proofs are based upon recent results for empirical processes indexed by sets and functions and convexity arguments. We also give a careful proof of identifiability of the underlying distribution $G$ under connectedness of a certain graph $\mathbf{G}$. Examples and applications include length-biased sampling, stratified sampling, "enriched" stratified sampling, "choice-based" sampling in econometrics and "case-control" studies in biostatistics. A final section discusses design issues and further problems.

Article information

Ann. Statist., Volume 16, Number 3 (1988), 1069-1112.

First available in Project Euclid: 12 April 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier


Primary: 62G05: Estimation
Secondary: 60F05: Central limit and other weak theorems 62G30: Order statistics; empirical distribution functions 60G44: Martingales with continuous parameter

Asymptotic theory case-control studies choice based sampling empirical processes enriched stratified sampling graphs lenght-biased sampling Neyman allocation nonparametric maximum likelihood selection bias models stratified sampling Vardi's estimator


Gill, Richard D.; Vardi, Yehuda; Wellner, Jon A. Large Sample Theory of Empirical Distributions in Biased Sampling Models. Ann. Statist. 16 (1988), no. 3, 1069--1112. doi:10.1214/aos/1176350948.

Export citation