In this paper we consider the estimation of population size from one-source capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, … contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution. We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.
References
Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975)., Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge.
Mathematical Reviews (MathSciNet):
MR381130
Böhning, D., Suppawattanabodee, B., Kusolvisitkul, W. and Viwatwongkasem, C. (2004). Estimating the number of drug users in Bangkok 2001: A capture–recapture approach using repeated entries in one list., European Journal of Epidemiology 19 1075–1083.
Böhning, D. and Schön, D. (2005). Nonparametric maximum likelihood estimation of the population size based upon the counting distribution., J. Roy. Statist. Soc. Ser. C 54 721–737.
Böhning, D. and Kuhnert, R. (2006). The equivalence of truncated count mixture distributions and mixture of truncated of truncated count distributions., Biometrics 62 1207–1215.
Böhning, D. and van der Heijden, P. G. M. (2008a). Supplement to “A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive populations.” DOI:, 10.1214/08-AOAS214SUPP.
Böhning, D. and van der Heijden, P. G. M. (2008b). Supplement to “A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive, populations.”
Böhning, D. and van der Heijden, P. G. M. (2008c). Supplement to “A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive, populations.”
Chao, A. (1987). Estimating the population size for capture–recapture data with unequal capture probabilities., Biometrics 43 783–791.
Mathematical Reviews (MathSciNet):
MR920467
Chao, A. (1989). Estimating population size for sparse data in capture–recapture experiments., Biometrics 45 427–438.
Gurmu, S. (1991). Tests for detecting overdispersion in the positive Poisson regression model., J. Bus. Econom. Statist. 9 215–222.
Hay, G. and Smit, F. (2003). Estimating the number of drug injectors from needle exchange data., Addiction Research and Theory 11 235–243.
Hook, E. B. and Regal, R. (1995). Capture–recapture methods in epidemiology: Methods and limitations., Epidemiologic Reviews 17 243–264.
Huggins, R. M. (1989). On the statistical analysis of capture experiments., Biometrika 76 133–140.
Mathematical Reviews (MathSciNet):
MR991431
International Working Group for Disease Monitoring and Forecasting (1995a). Capture–recapture and multiple record systems estimation I: History and theoretical development., American Journal of Epidemiology 142 1047–1058.
International Working Group for Disease Monitoring and Forecasting (1995b). Capture–recapture and multiple record systems estimation II: Applications in human diseases., American Journal of Epidemiology 142 1059–1068.
McKendrick, A. G. (1926). Application of mathematics to medical problems., Proc. Edinb. Math. Soc. 44 98–130.
Roberts, J. M. and Brewer, D. D. (2006). Estimating the prevalence of male clients of prostitute women in Vancouver with a simple capture–recapture method., J. Roy. Statist. Soc. Ser. A 169 745–756.
Ross, S. M. (1985)., Introduction to Probability Models, 3rd ed. Academic Press, Orlando, FL.
Smit, F., Reinking, D. and Reijerse, M. (2002). Estimating the number of people eligible for health service use., Evaluation and Program Planning 25 101–105.
Thompson, S. K. (2002)., Sampling, 2nd ed. Wiley, New York.
van der Heijden, P. G. M., Bustami, R., Cruyff, M., Engbersen, G. and van Houwelingen, H. C. (2003a). Point and interval estimation of the population size using the truncated Poisson regression model., Stat. Model. 3 305–322.
van der Heijden, P. G. M., Cruyff, M. and van Houwelingen, H. C. (2003b). Estimating the size of a criminal population from police records using the truncated Poisson regression model., Statist. Neerlandica 57 1–16.
Van Hest, N. H. A., Grant, A. D., Smit, F., Story, A. and Richardus, J. H. (2007). Estimating infectious diseases incidence: Validity of capture–recapture analysis and truncated models for incomplete count data., Epidemiology and Infection 136 14–22.
Wilson, R. M. and Collins, M. F. (1992). Capture–recapture estimation with samples of size one using frequency data., Biometrika 79 543–553.
Zelterman, D. (1988). Robust estimation in truncated discrete distributions with applications to capture–recapture experiments., J. Statist. Plann. Inference 18 225–237.
Mathematical Reviews (MathSciNet):
MR922210