March 2024 A simple and flexible test of sample exchangeability with applications to statistical genomics
Alan J. Aw, Jeffrey P. Spence, Yun S. Song
Author Affiliations +
Ann. Appl. Stat. 18(1): 858-881 (March 2024). DOI: 10.1214/23-AOAS1817

Abstract

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a nonparametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment we find that removing rare variants can substantially increase the p-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

Funding Statement

This research is supported in part by an NIH Grant R35-GM134922.

Acknowledgments

We thank Dan Erdmann-Pham, Ziyue Gao, Iain Mathieson, Nick Patterson, Sebastián Prillo, Florian Privé and Clara Wong-Fannjiang for helpful discussions. All authors are affiliated with the Center for Computational Biology at UC Berkeley.

Citation

Download Citation

Alan J. Aw. Jeffrey P. Spence. Yun S. Song. "A simple and flexible test of sample exchangeability with applications to statistical genomics." Ann. Appl. Stat. 18 (1) 858 - 881, March 2024. https://doi.org/10.1214/23-AOAS1817

Information

Received: 1 November 2022; Revised: 1 March 2023; Published: March 2024
First available in Project Euclid: 31 January 2024

MathSciNet: MR4698634
Digital Object Identifier: 10.1214/23-AOAS1817

Keywords: exchangeability , feature independence , LD splitting , Nonparametric test , population stratification

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 1 • March 2024
Back to Top