The Annals of Statistics

Interaction pursuit in high-dimensional multi-response regression via distance correlation

Yinfei Kong, Daoji Li, Yingying Fan, and Jinchi Lv

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Feature interactions can contribute to a large proportion of variation in many prediction models. In the era of big data, the coexistence of high dimensionality in both responses and covariates poses unprecedented challenges in identifying important interactions. In this paper, we suggest a two-stage interaction identification method, called the interaction pursuit via distance correlation (IPDC), in the setting of high-dimensional multi-response interaction models that exploits feature screening applied to transformed variables with distance correlation followed by feature selection. Such a procedure is computationally efficient, generally applicable beyond the heredity assumption, and effective even when the number of responses diverges with the sample size. Under mild regularity conditions, we show that this method enjoys nice theoretical properties including the sure screening property, support union recovery and oracle inequalities in prediction and estimation for both interactions and main effects. The advantages of our method are supported by several simulation studies and real data analysis.

Article information

Ann. Statist., Volume 45, Number 2 (2017), 897-922.

Received: December 2015
First available in Project Euclid: 16 May 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation 62J02: General nonlinear regression
Secondary: 62F07: Ranking and selection 62F12: Asymptotic properties of estimators

Interaction pursuit distance correlation square transformation multi-response regression high dimensionality sparsity


Kong, Yinfei; Li, Daoji; Fan, Yingying; Lv, Jinchi. Interaction pursuit in high-dimensional multi-response regression via distance correlation. Ann. Statist. 45 (2017), no. 2, 897--922. doi:10.1214/16-AOS1474.

Export citation


  • [1] Bien, J., Taylor, J. and Tibshirani, R. (2013). A LASSO for hierarchical interactions. Ann. Statist. 41 1111–1141.
  • [2] Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc. 107 1533–1545.
  • [3] Choi, N. H., Li, W. and Zhu, J. (2010). Variable selection with the strong heredity constraint and its oracle property. J. Amer. Statist. Assoc. 105 354–364.
  • [4] Chun, H. and Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 3–25.
  • [5] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • [6] Fan, Y., Kong, Y., Li, D. and Lv, J. (2016). Interaction pursuit with feature screening and selection. Preprint. Available at arXiv:1605.08933.
  • [7] Fan, Y., Kong, Y., Li, D. and Zheng, Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. Ann. Statist. 43 1243–1272.
  • [8] Hall, P. and Xue, J.-H. (2014). On selecting interacting features from high-dimensional data. Comput. Statist. Data Anal. 71 694–708.
  • [9] Hao, N. and Zhang, H. H. (2014). Interaction screening for ultrahigh-dimensional data. J. Amer. Statist. Assoc. 109 1285–1301.
  • [10] Huo, X. and Székely, G. J. (2016). Fast computing for distance covariance. Technometrics. To appear.
  • [11] Jiang, B. and Liu, J. S. (2014). Variable selection for general index models via sliced inverse regression. Ann. Statist. 42 1751–1786.
  • [12] Kong, Y., Li, D., Fan, Y. and Lv, J. (2016). Supplement to “Interaction pursuit in high-dimensional multi-response regression via distance correlation.” DOI:10.1214/16-AOS1474SUPP.
  • [13] Li, J., Zhong, W., Li, R. and Wu, R. (2014). A fast algorithm for detecting gene-gene interactions in genome-wide association studies. Ann. Appl. Stat. 8 2292–2318.
  • [14] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139.
  • [15] Lv, J. (2013). Impacts of high dimensionality in finite samples. Ann. Statist. 41 2236–2262.
  • [16] Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.
  • [17] Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Combined expression trait correlations and expression quantitative trait locus mapping. Mol. Biol. Cell 9 3273–3297.
  • [18] Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
  • [19] Yuan, M., Joseph, V. R. and Zou, H. (2009). Structured variable selection and estimation. Ann. Appl. Stat. 3 1738–1757.
  • [20] Zhong, W. and Zhu, L. (2015). An iterative approach to distance correlation-based sure independence screening. J. Stat. Comput. Simul. 85 2331–2345.

Supplemental materials

  • Supplementary material to “Interaction pursuit in high-dimensional multi-response regression via distance correlation”. Due to space constraints, the details about the post-screening interaction selection, additional numerical studies, some intermediate steps of the proof of Theorem 1 and additional technical details are provided in the Supplementary Material [12].