Open Access
December 2019 Randomized incomplete $U$-statistics in high dimensions
Xiaohui Chen, Kengo Kato
Ann. Statist. 47(6): 3127-3156 (December 2019). DOI: 10.1214/18-AOS1773

Abstract

This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of big data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive nonasymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both nondegenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.

Citation

Download Citation

Xiaohui Chen. Kengo Kato. "Randomized incomplete $U$-statistics in high dimensions." Ann. Statist. 47 (6) 3127 - 3156, December 2019. https://doi.org/10.1214/18-AOS1773

Information

Received: 1 December 2017; Revised: 1 October 2018; Published: December 2019
First available in Project Euclid: 31 October 2019

Digital Object Identifier: 10.1214/18-AOS1773

Subjects:
Primary: 62E17 , 62F40
Secondary: 62H15

Keywords: Bernoulli sampling , bootstrap , Divide and conquer , Gaussian approximation , incomplete $U$-statistics , randomized inference , sampling with replacement

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.47 • No. 6 • December 2019
Back to Top