April 2022 Optimal false discovery rate control for large scale multiple testing with auxiliary information
Hongyuan Cao, Jun Chen, Xianyang Zhang
Author Affiliations +
Ann. Statist. 50(2): 807-857 (April 2022). DOI: 10.1214/21-AOS2128


Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

Funding Statement

H. Cao acknowledges partial support from NIH Grant 2UL1TR001427-05.
J. Chen acknowledges support from Mayo Clinic Center for Individualized Medicine.
X. Zhang acknowledges partial support from NSF Grants DMS-1830392 and DMS-1811747.
J. Chen and X. Zhang also acknowledge support from National Institutes of Health Grant R21 HG011662.


The authors are alphabetically ordered. Address correspondence to Xianyang Zhang (zhangxiany@stat.tamu.edu).

The authors would like to thank the Associate Editor and the reviewers for their constructive comments and helpful suggestions, which substantially improved the paper. Data on coronary artery disease/myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.cardiogramplusc4d.org.


Download Citation

Hongyuan Cao. Jun Chen. Xianyang Zhang. "Optimal false discovery rate control for large scale multiple testing with auxiliary information." Ann. Statist. 50 (2) 807 - 857, April 2022. https://doi.org/10.1214/21-AOS2128


Received: 1 September 2020; Revised: 1 March 2021; Published: April 2022
First available in Project Euclid: 7 April 2022

MathSciNet: MR4404920
zbMATH: 1486.62218
Digital Object Identifier: 10.1214/21-AOS2128

Primary: 62G07 , 62G10
Secondary: 62C12

Keywords: EM algorithm , False discovery rate , isotonic regression , local false discovery rate , multiple testing , Pool-Adjacent-Violators algorithm

Rights: Copyright © 2022 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.50 • No. 2 • April 2022
Back to Top