Electronic Journal of Statistics

Novel sampling design for respondent-driven sampling

Mohammad Khabbazian, Bret Hanlon, Zoe Russek, and Karl Rohe

Full-text: Open access


Respondent-driven sampling (RDS) is a method of chain referral sampling popular for sampling hidden and/or marginalized populations. As such, even under the ideal sampling assumptions, the performance of RDS is restricted by the underlying social network: if the network is divided into communities that are weakly connected to each other, then RDS is likely to oversample one of these communities. In order to diminish the “referral bottlenecks” between communities, we propose anti-cluster RDS (AC-RDS), an adjustment to the standard RDS implementation. Using a standard model in the RDS literature, namely, a Markov process on the social network that is indexed by a tree, we construct and study the Markov transition matrix for AC-RDS. We show that if the underlying network is generated from the Stochastic Blockmodel with equal block sizes, then the transition matrix for AC-RDS has a larger spectral gap and consequently faster mixing properties than the standard random walk model for RDS. In addition, we show that AC-RDS reduces the covariance of the samples in the referral tree compared to the standard RDS and consequently leads to a smaller variance and design effect. We confirm the effectiveness of the new design using both the Add-Health networks and simulated networks.

Article information

Electron. J. Statist., Volume 11, Number 2 (2017), 4769-4812.

Received: March 2017
First available in Project Euclid: 27 November 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Hard-to-reach population respondent-driven sampling social network Markov chain stochastic Blockmodels anti-cluster RDS

Creative Commons Attribution 4.0 International License.


Khabbazian, Mohammad; Hanlon, Bret; Russek, Zoe; Rohe, Karl. Novel sampling design for respondent-driven sampling. Electron. J. Statist. 11 (2017), no. 2, 4769--4812. doi:10.1214/17-EJS1358. https://projecteuclid.org/euclid.ejs/1511773484

Export citation


  • [1] Arayasirikul, S., Cai, X., and Wilson, E. C. (2015). A qualitative examination of respondent-driven sampling (RDS) peer referral challenges among young transwomen in the San Francisco Bay Area., JMIR Public Health and Surveillance, 1(2).
  • [2] Baraff, A. J., McCormick, T. H., and Raftery, A. E. (2016). Estimating uncertainty in respondent-driven sampling using a tree bootstrap method., Proceedings of the National Academy of Sciences, 113(2), 14668–14673.
  • [3] Bassetti, F., Diaconis, P., et al. (2006). Examples comparing importance sampling and the Metropolis algorithm. Illinois Journal of Mathematics, 50(1-4), 67–91.
  • [4] Benjamini, I. and Peres, Y. (1994). Markov chains indexed by trees., The Annals of Probability, 22(1), 219–243.
  • [5] Centers for Disease Control and Prevention (2012). National HIV behavioral surveillance system in injecting drug users–round 3: Operations manual. Available from: Gabriela Paz-Bailey, (gpazbailey@cdc.gov).
  • [6] Chung, F. and Radcliffe, M. (2011). On the spectra of general random graphs., Electronic Journal of Combinatorics, 18(1), 215–229.
  • [7] Chung, F. R. (1997)., Spectral graph theory, volume 92. American Mathematical Soc.
  • [8] Crawford, F. W., Aronow, P. M., Zeng, L., and Li, J. (2017). Identification of homophily and preferential recruitment in respondent-driven sampling., American Journal of Epidemiology.
  • [9] Gile, K. J. (2011). Improved inference for respondent-driven sampling data with application to HIV prevalence estimation., Journal of the American Statistical Association, 106(493).
  • [10] Gile, K. J. and Handcock, M. S. (2010). Respondent-driven sampling: An assessment of current methodology., Sociological Methodology, 40(1), 285–327.
  • [11] Gile, K. J. and Handcock, M. S. (2011). Network model-assisted inference from respondent-driven sampling data., arXiv preprint arXiv:1108.0298.
  • [12] Gile, K. J., Johnston, L. G., and Salganik, M. J. (2014). Diagnostics for respondent-driven sampling., Journal of the Royal Statistical Society: Series A (Statistics in Society).
  • [13] Girvan, M. and Newman, M. E. (2002). Community structure in social and biological networks., Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
  • [14] Goel, S. and Salganik, M. J. (2009). Respondent-driven sampling as Markov chain Monte Carlo., Statistics in medicine, 28(17), 2202–2229.
  • [15] Goel, S. and Salganik, M. J. (2010). Assessing respondent-driven sampling., Proceedings of the National Academy of Sciences, 107(15), 6743–6747.
  • [16] Goodman, L. A. (1961). Snowball sampling., The Annals of Mathematical Statistics, 32(1), 148–170.
  • [17] Handcock, M. S. and Gile, K. J. (2011). Comment: On the concept of snowball sampling., Sociological Methodology, 41(1), 367–371.
  • [18] Heckathorn, D. D. (1997). Respondent-driven sampling: a new approach to the study of hidden populations., Social Problems, 44(2), 174–199.
  • [19] Heckathorn, D. D. (2002). Respondent-driven sampling ii: deriving valid population estimates from chain-referral samples of hidden populations., Social Problems, 49(1), 11–34.
  • [20] Heckathorn, D. D. (2007). Extensions of respondent-driven sampling: Analyzing continuous variables and controlling for differential recruitment., Sociological Methodology, 37(1), 151–207.
  • [21] Heckathorn, D. D. (2011). Comment: Snowball versus respondent-driven sampling., Sociological Methodology, 41(1), 355–366.
  • [22] Holland, P., Laskey, K., and Leinhardt, S. (1983). Stochastic blockmodels: First steps., Social Networks, 5(2), 109–137.
  • [23] Johnston, L. (2013). Introduction to HIV/AIDS and sexually transmitted infection surveillance: module 4: introduction to respondent-driven, sampling.
  • [24] Levin, D. A., Peres, Y., and Wilmer, E. L. (2009)., Markov chains and mixing times. American Mathematical Society.
  • [25] Malekinejad, M., Johnston, L. G., Kendall, C., Kerr, L. R. F. S., Rifkin, M. R., and Rutherford, G. W. (2008). Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic review., AIDS and Behavior, 12(1), 105–130.
  • [26] Mathias, R. (1990). The spectral norm of a nonnegative matrix., Linear Algebra and its Applications, 139, 269–284.
  • [27] McCoy, S. I., Shiu, K., Martz, T. E., Smith, C. D., Mattox, L., Gluth, D. R., Murgai, N., Martin, M., and Padian, N. S. (2013). Improving the efficiency of hiv testing with peer recruitment, financial incentives, and the involvement of persons living with hiv infection., JAIDS Journal of Acquired Immune Deficiency Syndromes, 63(2), e56–e63.
  • [28] McCreesh, N., Frost, S., Seeley, J., Katongole, J., Tarsh, M. N., Ndunguse, R., Jichi, F., Lunel, N. L., Maher, D., Johnston, L. G., et al. (2012). Evaluation of respondent-driven sampling. Epidemiology (Cambridge, Mass.), 23(1), 138.
  • [29] Mouw, T. and Verdery, A. M. (2012). Network sampling with memory: A proposal for more efficient sampling from social networks., Sociological Methodology, 42(1), 206–256.
  • [30] Ott, M. Q., Gile, K. J., et al. (2016). Unequal edge inclusion probabilities in link-tracing network sampling with implications for respondent-driven sampling. Electronic Journal of Statistics, 10(1), 1109–1132.
  • [31] Rohe, K. (2015). Network driven sampling; a critical threshold for design effects., arXiv preprint arXiv:1505.05461.
  • [32] Rohe, K., Chatterjee, S., and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel., The Annals of Statistics, 39(4), 1878–1915.
  • [33] Salganik, M. J. (2012). Commentary: respondent-driven sampling in the real world., Epidemiology, 23(1), 148–150.
  • [34] Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling., Sociological Methodology, 34(1), 193–240.
  • [35] Stewart, G. W. and Sun, J.-G. (1990)., Matrix perturbation theory. Academic Press.
  • [36] Verdery, A. M., Mouw, T., Bauldry, S., and Mucha, P. J. (2015). Network structure and biased variance estimation in respondent driven sampling., PLoS ONE, 10(12), e0145296.
  • [37] Verdery, A. M., Fisher, J. C., Siripong, N., Abdesselam, K., and Bauldry, S. (2016). New survey questions and estimators for network clustering with respondent-driven sampling data., arXiv preprint arXiv:1610.06683.
  • [38] Volz, E. and Heckathorn, D. D. (2008). Probability based estimation theory for respondent driven sampling., Journal of Official Statistics, 24(1), 79.
  • [39] Von Luxburg, U. (2007). A tutorial on spectral clustering., Statistics and Computing, 17(4), 395–416.
  • [40] Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-world networks., Nature, 393(6684), 440–442.
  • [41] Wejnert, C. (2009). An empirical test of respondent-driven sampling: Point estimates, variance, degree measures, and out-of-equilibrium data., Sociological Methodology, 39(1), 73–116.
  • [42] Wejnert, C. and Heckathorn, D. D. (2008). Web-based network sampling: efficiency and efficacy of respondent-driven sampling for online research., Sociological Methods & Research.
  • [43] White, R. G., Lansky, A., Goel, S., Wilson, D., Hladik, W., Hakim, A., and Frost, S. D. (2012). Respondent driven sampling: where we are and where should we be going?, Sexually Transmitted Infections, 88(6), 397–399.
  • [44] White, R. G., Hakim, A. J., Salganik, M. J., Spiller, M. W., Johnston, L. G., Kerr, L., Kendall, C., Drake, A., Wilson, D., Orroth, K., et al. (2015). Strengthening the reporting of observational studies in epidemiology for respondent-driven sampling studies: STROBE-RDS statement. Journal of Clinical Epidemiology, 68(12), 1463–1471.
  • [45] Yu, Y., Wang, T., and Samworth, R. (2015). A useful variant of the Davis–Kahan theorem for statisticians., Biometrika, 102(2), 315–323.