Modeling the visibility distribution for respondent-driven sampling with application to population size estimation

Katherine R. McLaughlin; Lisa G. Johnston; Xhevat Jakupi; Dafina Gexha-Bunjaku; Edona Deva; Mark S. Handcock

doi:10.1214/23-AOAS1807

Abstract

Respondent-driven sampling (RDS) is used throughout the world to estimate prevalence and population size for hidden populations. Although RDS is an effective method for enrolling people from key populations in studies, it relies on a partially unknown sampling mechanism, and thus each individual’s inclusion probability is unknown. Current estimators for population prevalence, population size, and other outcomes rely on a participant’s network size (degree) to approximate their inclusion probability in the sample from the networked population. However, in most RDS studies, a participant’s network size is attained via a self-report and is subject to many types of misreporting and bias. Because design-based inclusion probabilities cannot be exactly computed, we instead use the term visibility to describe how likely a person is to be selected to participate in the study. The commonly used successive sampling population size estimation (SS-PSE) framework to estimate population sizes from RDS data relies on self-reported network sizes in the model for the sampling mechanism. We propose an enhancement of the SS-PSE framework that adds a measurement error model for visibility used in place of the self-reported network size and a model for the number of recruits an individual can enroll. Inferred visibilities are a way to smooth the degree distribution and bring in outliers as well as a mechanism to deal with missing and invalid network sizes. We demonstrate the performance of visibility SS-PSE on three populations from Kosovo sampled in 2014 using RDS. We also discuss how the visibility modeling framework could be extended to prevalence estimation.

Funding Statement

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1144087.

Acknowledgments

The authors would like to thank the National Institute for Public Health of Kosovo for their work designing, implementing, and lending their expertise to the RDS study used as examples in Section 5. The authors would also like to thank members of the Hard-to-Reach Populations Methods Research Group (HPMRG) and the RDS Analyst Users Group for helpful comments on the implementation of the method.

Citation

Download Citation

Katherine R. McLaughlin. Lisa G. Johnston. Xhevat Jakupi. Dafina Gexha-Bunjaku. Edona Deva. Mark S. Handcock. "Modeling the visibility distribution for respondent-driven sampling with application to population size estimation." Ann. Appl. Stat. 18 (1) 683 - 703, March 2024. https://doi.org/10.1214/23-AOAS1807

Information

Received: 1 December 2021; Revised: 1 July 2023; Published: March 2024

First available in Project Euclid: 31 January 2024

MathSciNet: MR4698626

Digital Object Identifier: 10.1214/23-AOAS1807

Keywords: heaped data , hidden population , measurement error model , model-based survey sampling , network sampling

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS