September 2024 Patient recruitment using electronic health records under selection bias: A two-phase sampling framework
Guanghao Zhang, Lauren J. Beesley, Bhramar Mukherjee, Xu Shi
Author Affiliations +
Ann. Appl. Stat. 18(3): 1858-1878 (September 2024). DOI: 10.1214/23-AOAS1860

Abstract

Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients’ health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.

Acknowledgments

We thank the Michigan Genomics Initiative participants, Precision Health at the University of Michigan, and the University of Michigan Medical School Data Office for Clinical and Translational Research for providing data storage, management, processing, and distribution services. We thank the Advanced Research Computing Technology Services at the University of Michigan for providing data storage and computing resources. The study protocols were reviewed and determined exempt by the University of Michigan Medical School Institutional Review Board (IRB ID HUM00177982). This study was supported by the National Institute of Health (award R01GM139926 to Xu Shi), the National Science Foundation (award 1712933 to Bhramar Mukherjee), and the National Cancer Institute (award CA046592-34 to Bhramar Mukherjee).

Citation

Download Citation

Guanghao Zhang. Lauren J. Beesley. Bhramar Mukherjee. Xu Shi. "Patient recruitment using electronic health records under selection bias: A two-phase sampling framework." Ann. Appl. Stat. 18 (3) 1858 - 1878, September 2024. https://doi.org/10.1214/23-AOAS1860

Information

Received: 1 June 2022; Revised: 1 December 2023; Published: September 2024
First available in Project Euclid: 5 August 2024

Digital Object Identifier: 10.1214/23-AOAS1860

Keywords: Auxiliary information , electronic health records , selection bias , study design , two-phase sampling

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 3 • September 2024
Back to Top