The Annals of Applied Statistics

Estimating population size using the network scale up method

Rachael Maltiel, Adrian E. Raftery, Tyler H. McCormick, and Aaron J. Baraff

Full-text: Open access

Abstract

We develop methods for estimating the size of hard-to-reach populations from data collected using network-based questions on standard surveys. Such data arise by asking respondents how many people they know in a specific group (e.g., people named Michael, intravenous drug users). The Network Scale up Method (NSUM) is a tool for producing population size estimates using these indirect measures of respondents’ networks. Killworth et al. [Soc. Netw. 20 (1998a) 23–50, Evaluation Review 22 (1998b) 289–308] proposed maximum likelihood estimators of population size for a fixed effects model in which respondents’ degrees or personal network sizes are treated as fixed. We extend this by treating personal network sizes as random effects, yielding principled statements of uncertainty. This allows us to generalize the model to account for variation in people’s propensity to know people in particular subgroups (barrier effects), such as their tendency to know people like themselves, as well as their lack of awareness of or reluctance to acknowledge their contacts’ group memberships (transmission bias). NSUM estimates also suffer from recall bias, in which respondents tend to underestimate the number of members of larger groups that they know, and conversely for smaller groups. We propose a data-driven adjustment method to deal with this. Our methods perform well in simulation studies, generating improved estimates and calibrated uncertainty intervals, as well as in back estimates of real sample data. We apply them to data from a study of HIV/AIDS prevalence in Curitiba, Brazil. Our results show that when transmission bias is present, external information about its likely extent can greatly improve the estimates. The methods are implemented in the NSUM R package.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 3 (2015), 1247-1277.

Dates
Received: August 2013
Revised: December 2014
First available in Project Euclid: 2 November 2015

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1446488738

Digital Object Identifier
doi:10.1214/15-AOAS827

Mathematical Reviews number (MathSciNet)
MR3418722

Zentralblatt MATH identifier
06525985

Keywords
Aggregated relational data barrier effect HIV/AIDS recall bias social network transmission bias

Citation

Maltiel, Rachael; Raftery, Adrian E.; McCormick, Tyler H.; Baraff, Aaron J. Estimating population size using the network scale up method. Ann. Appl. Stat. 9 (2015), no. 3, 1247--1277. doi:10.1214/15-AOAS827. https://projecteuclid.org/euclid.aoas/1446488738


Export citation

References

  • Bernard, R. H., Johnsen, E., Killworth, P. and Robinson, S. (1989). Estimating the size of an average personal network and of an event subpopulation. In The Small World (M. Kochen, ed.) 159–175. Ablex Press, New Jersey.
  • Bernard, R. H., Johnsen, E., Killworth, P. and Robinson, S. (1991). Estimating the size of an average personal network and of an event subpopulation: Some empirical results. Soc. Sci. Res. 20 109–121.
  • De Valpine, P. (2003). Better inferences from population-dynamics experiments using Monte Carlo state-space likelihood methods. Ecology 84 3064–3077.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
  • Ezoe, S., Morooka, T., Noda, T., Sabin, M. L. and Koike, S. (2012). Population size estimation of men who have sex with men through the network scale-up method in Japan. PLoS ONE 7 e31184.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 28 457–472.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
  • Kadushin, C., Killworth, P., Bernard, H. and Beveridge, A. (2006). Scale-up methods as applied to estimates of heroin use. J. Drug Issues 36 417.
  • Killworth, P., Johnsen, E., McCarty, C., Shelley, G. and Bernard, H. (1998a). A social network approach to estimating seroprevalence in the United States. Soc. Netw. 20 23–50.
  • Killworth, P., McCarty, C., Bernard, H., Shelley, G. and Johnsen, E. (1998b). Estimation of seroprevalence, rape, and homelessness in the United States using a social network approach. Evaluation Review 22 289–308.
  • Killworth, P. D., McCarty, C., Bernard, H. R., Johnsen, E. C., Domini, J. and Shelley, G. A. (2003). Two interpretations of reports of knowledge of subpopulation sizes. Soc. Netw. 25 141–160.
  • Killworth, P. D., McCarty, C., Johnsen, E. C., Bernard, H. R. and Shelley, G. A. (2006). Investigating the variation of personal network size under unknown error conditions. Sociol. Methods Res. 35 84–112.
  • McCarty, C., Killworth, P. D., Bernard, H. R., Johnsen, E. C. and Shelley, G. A. (2001). Comparing two methods for estimating network size. Human Organ. 60 28–39.
  • McCormick, T. H., Salganik, M. J. and Zheng, T. (2010). How many people do you know? Efficiently estimating personal network size. J. Amer. Statist. Assoc. 105 59–70.
  • McCormick, T. H. and Zheng, T. (2007). Adjusting for recall bias in “How many X’s do you know?” surveys. In Proceedings of the Joint Statistical Meetings American Statistical Association, Washington, DC.
  • McCormick, T. H. and Zheng, T. (2012). Latent demographic profile estimation in hard-to-reach groups. Ann. Appl. Stat. 6 1795–1813.
  • Mielke, P. Jr. (1975). Convenient beta distribution likelihood techniques for describing and comparing meteorological data. J. Appl. Meteorol. 14 985–990.
  • Paniotto, V., Petrenko, T., Kupriyanov, V. and Pakhok, O. (2009). Estimating the size of populations with high risk for HIV using the network scale-up method. Analytical report, Kiev International Institute of Sociology.
  • Raftery, A. E. (1988). Inference and prediction for the binomial N parameter: A hierarchical Bayes approach. Biometrika 75 223–228.
  • Raftery, A. E. and Lewis, S. M. (1996). Implementing MCMC. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 115–130. Chapman & Hall, London.
  • Ripley, B. D. and Thompson, M. (1987). Regression techniques for the detection of analytical bias. Analyst 112 377–383.
  • Salganik, M., Fazito, D., Bertoni, N., Abdo, A., Mello, M. and Bastos, F. (2011a). Assessing network scale-up estimates for groups most at risk of HIV/AIDS: Evidence from a multiple-method study of heavy drug users in Curitiba, Brazil. Am. J. Epidemiol. 174 1190–1196.
  • Salganik, M. J., Mello, M. B., Abdo, A. H., Bertoni, N., Fazito, D. and Bastos, F. I. (2011b). The game of contacts: Estimating the social visibility of groups. Soc. Netw. 33 70–78.
  • Skellam, J. G. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J. Roy. Statist. Soc. Ser. B 10 257–261.
  • Zheng, T., Salganik, M. J. and Gelman, A. (2006). How many people do you know in prison?: Using overdispersion in count data to estimate social structure in networks. J. Amer. Statist. Assoc. 101 409–423.