December 2021 A simple measure of conditional dependence
Mona Azadkia, Sourav Chatterjee
Author Affiliations +
Ann. Statist. 49(6): 3070-3102 (December 2021). DOI: 10.1214/21-AOS2073

Abstract

We propose a coefficient of conditional dependence between two random variables Y and Z given a set of other variables X1,,Xp, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0,1], where the limit is 0 if and only if Y and Z are conditionally independent given X1,,Xp, and is 1 if and only if Y is equal to a measurable function of Z given X1,,Xp. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial R2 statistic for measuring conditional dependence by regression. Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real data sets are worked out.

Funding Statement

The second author was supported in part by NSF Grants DMS-1608249 and DMS-1855484.

Acknowledgments

We are grateful to Mohsen Bayati, Persi Diaconis, Adityanand Guntuboyina, Susan Holmes, Bodhisattva Sen and Rob Tibshirani for helpful comments, and to Nima Hamidi, Norm Matloff and Balasubramanian Narasimhan for help with preparing the R package FOCI. We also thank the anonymous referees and the Associate Editor for various useful suggestions that helped improve the paper.

Funding Statement

The second author was supported in part by NSF Grants DMS-1608249 and DMS-1855484.

Acknowledgments

We are grateful to Mohsen Bayati, Persi Diaconis, Adityanand Guntuboyina, Susan Holmes, Bodhisattva Sen and Rob Tibshirani for helpful comments, and to Nima Hamidi, Norm Matloff and Balasubramanian Narasimhan for help with preparing the R package FOCI. We also thank the anonymous referees and the Associate Editor for various useful suggestions that helped improve the paper.

Citation

Download Citation

Mona Azadkia. Sourav Chatterjee. "A simple measure of conditional dependence." Ann. Statist. 49 (6) 3070 - 3102, December 2021. https://doi.org/10.1214/21-AOS2073

Information

Received: 1 July 2020; Revised: 1 January 2021; Published: December 2021
First available in Project Euclid: 14 December 2021

MathSciNet: MR4352523
zbMATH: 1486.62175
Digital Object Identifier: 10.1214/21-AOS2073

Subjects:
Primary: 62G05 , 62H20

Keywords: Conditional dependence , nonparametric measures of association , Variable selection

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
33 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.49 • No. 6 • December 2021
Back to Top