## Electronic Journal of Statistics

### Supervised dimensionality reduction via distance correlation maximization

#### Abstract

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, (Székely et al., 2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low-dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximization method of (Parizi et al., 2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 960-984.

Dates
First available in Project Euclid: 9 March 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1520586206

Digital Object Identifier
doi:10.1214/18-EJS1403

Mathematical Reviews number (MathSciNet)
MR3772810

Zentralblatt MATH identifier
06864482

#### Citation

Vepakomma, Praneeth; Tonde, Chetan; Elgammal, Ahmed. Supervised dimensionality reduction via distance correlation maximization. Electron. J. Statist. 12 (2018), no. 1, 960--984. doi:10.1214/18-EJS1403. https://projecteuclid.org/euclid.ejs/1520586206

#### References

• Amari, S.-I. (1998). Natural gradient works efficiently in learning., Neural Computation, 10(2):251–276.
• Berrendero José R, C. A. and Torrecilla, J. L. (2014). Variable selection in functional data classification: a maxima-hunting proposal., Statistica Sinica.
• Borg, I. and Groenen, P. J. (2005)., Modern multidimensional scaling: Theory and applications. Springer Science & Business Media.
• Buza, K. (2014). Feedback prediction for blogs. In, Data analysis, machine learning and knowledge discovery, pages 145–152. Springer.
• Chechik Gal, Globerson Amir, T. N. and Yair, W. (2005). Information bottleneck for gaussian variables., Journal of Machine Learning Research.
• Chung, F. (1997). Lecture notes on spectral graph theory., Providence, RI: AMS Publications.
• Cook, R. and Forzani, L. (2009). Likelihood based sufficient dimension reduction., Journal of the American Statistical Association, 104:197–208.
• Cook, R. D. (1996). Graphics for regressions with a binary response., Journal of the American Statistical Association, 91(435):983–992.
• Dinkelbach, W. (1967). On nonlinear fractional programming., Management Science. Journal of the Institute of Management Science. Application and Theory Series, 13(7):492–498.
• Fukumizu, K. and Leng, C. (2014). Gradient-based kernel dimension reduction for regression., Journal of the American Statistical Association, 109(505):359–370.
• Graf, F., Kriegel, H.-P., Schubert, M., Pölsterl, S., and Cavallaro, A. (2011). 2d image registration in CT images using radial image descriptors. In, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, pages 607–614. Springer.
• Harrison, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air., Journal of environmental economics and management, 5(1):81–102.
• Kiefer, J. (1953). Sequential minimax search for a maximum., Proceedings of the American Mathematical Society, 4(3):502–506.
• Kong, J., Wang, S., and Wahba, G. (2015). Using distance covariance for improved variable selection with application to learning genetic risk models., Statistics in medicine, 34(10):1708–1720.
• Lange, K. (2013). The MM algorithm. In, Optimization, volume 95 of Springer Texts in Statistics, pages 185–219. Springer New York.
• Lange, K., Hunter, D. R., and Yang, I. (2000). Optimization transfer using surrogate objective functions., Journal of Computational and Graphical Statistics, 9(1):1.
• Li, K.-C. (1991). Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86(414):316–327.
• Li, R., Zhong, W., and Zhu, L. (2012). Feature screening via distance correlation learning., Journal of the American Statistical Association, 107(499):1129–1139.
• Lichman, M. (2013). UCI machine learning, repository.
• Lue, H. H. (2009). Sliced inverse regression for multivariate response regression., Journal of Statistical Planning and Inference, 139:2656–2664.
• Nishimori, Y. and Akaho, S. (2005). Learning algorithms utilizing quasi-geodesic flows on the stiefel manifold., Neurocomputing, 67:106–135.
• Noam, S. (2002). The information bottleneck: Theory and applications., Ph.D. Thesis: Hebrew University of Jerusalem.
• Parizi, S. N., He, K., Sclaroff, S., and Felzenszwalb, P. (2015). Generalized majorization-minimization., arXiv preprint arXiv:1506.07613.
• Ravid, S.-Z. and Naftali, T. (2017). Opening the black box of deep neural networks via information., https://arxiv.org/abs/1703.00810.
• Schaible, S. (1976). Minimization of ratios., Journal of Optimization Theory and Applications, 19(2):347–352.
• Shao, Y., Cook, R., and Weisberg, S. (2007). Marginal tests with sliced average variance estimation., Biometrika, 94:285–296.
• Shao, Y., Cook, R., and Weisberg, S. (2009). Partial central subspace and sliced average variance estimation., Journal of Statistical Planning and Inference, 139:952–961.
• Sheng, W. and Yin, X. (2016). Sufficient dimension reduction via distance covariance., Journal of Computational and Graphical Statistics, 25(1):91–104.
• Sugiyama, M., Suzuki, T., and Kanamori, T. (2012)., Density ratio estimation in machine learning. Cambridge University Press, New York, NY, USA, 1st edition.
• Suzuki, T. and Sugiyama, M. (2013). Sufficient dimension reduction via squared-loss mutual information estimation., Neural computation, 25(3):725–758.
• Székely, G. J. and Rizzo, M. L. (2012). On the uniqueness of distance covariance., Statistics & Probability Letters, 82(12):2278–2282.
• Székely, G. J. and Rizzo, M. L. (2013). The distance correlation t-test of independence in high dimension., Journal of Multivariate Analysis, 117:193–213.
• Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances., Annals of Statistics, 35(6):2769–2794.
• Székely, G. J., Rizzo, M. L., et al. (2009). Brownian distance covariance., Annals of Applied Statistics, 3(4):1236–1265.
• Szekely, J. G., Rizzo, L. M., and Bakirov, K. N. (2007). Measuring and testing dependence by correlation of distances., Annals of Statistics, 35:2769–2794.
• Szretter, M. E. and Yohai, V. J. (2009). The sliced inverse regression algorithm as a maximum likelihood procedure., Journal of Statistical Planning and Inference, 139:3570–3578.
• Tishby Naftali, P. F. C. and William, B. (1999). The information bottleneck method., The 37th annual Allerton Conference on Communication, Control, and Computing.
• Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method., Psychometrika, 17(4):401–419.
• Torres-Sospedra, J., Montoliu, R., Martınez-Usó, A., Avariento, J. P., Arnau, T. J., Benedito-Bordonau, M., and Huerta, J. (2014). Ujiindoorloc: A new multi-building and multi-floor database for wlan fingerprint-based indoor localization problems. In, Proceedings of the fifth conference on indoor positioning and indoor navigation.
• Vapnik, V., Braga, I., and Izmailov, R. (2015). Constructive setting for problems of density ratio estimation., Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(3):137–146.
• Xin Chen, D. C. and Zou, C. (2015). Diagnostic studies in sufficient dimension reduction., Biometrika, 102(3):545–558.
• Yamada, M., Niu, G., Takagi, J., and Sugiyama, M. (2011). Sufficient component analysis for supervised dimension reduction., arXiv preprint arXiv:1103.4998.
• Zhang, A. (2008)., Quadratic fractional programming problems with quadratic constraints. PhD thesis, Kyoto University.
• Zhang, Y., Tapia, R., and Velazquez, L. (2000). On convergence of minimization methods: attraction, repulsion, and selection., Journal of Optimization Theory and Applications, 107(3):529–546.
• Zhou, F., Claire, Q., and King, R. D. (2014). Predicting the geographical origin of music. In, Data Mining (ICDM), 2014 IEEE International Conference on, pages 1115–1120. IEEE.