The Annals of Applied Probability

A random matrix approach to neural networks

Cosme Louart, Zhenyu Liao, and Romain Couillet

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

This article studies the Gram random matrix model $G=\frac{1}{T}\Sigma^{{\mathsf{T}}}\Sigma$, $\Sigma=\sigma(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_{1},\ldots,x_{T}]\in\mathbb{R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in\mathbb{R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries and $\sigma:\mathbb{R}\to\mathbb{R}$ is a Lipschitz continuous (activation) function—$\sigma(WX)$ being understood entry-wise. By means of a key concentration of measure lemma arising from nonasymptotic random matrix arguments, we prove that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+\gamma I_{T})^{-1}$, for $\gamma>0$, has a similar behavior as that met in sample covariance matrix models, involving notably the moment $\Phi=\frac{T}{n}{\mathrm{E}}[G]$, which provides in passing a deterministic equivalent for the empirical spectral measure of $G$. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

Article information

Source
Ann. Appl. Probab., Volume 28, Number 2 (2018), 1190-1248.

Dates
Received: February 2017
Revised: June 2017
First available in Project Euclid: 11 April 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1523433634

Digital Object Identifier
doi:10.1214/17-AAP1328

Mathematical Reviews number (MathSciNet)
MR3784498

Zentralblatt MATH identifier
06897953

Subjects
Primary: 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52)
Secondary: 62M45: Neural nets and related approaches

Keywords
Random matrix theory random feature maps neural networks

Citation

Louart, Cosme; Liao, Zhenyu; Couillet, Romain. A random matrix approach to neural networks. Ann. Appl. Probab. 28 (2018), no. 2, 1190--1248. doi:10.1214/17-AAP1328. https://projecteuclid.org/euclid.aoap/1523433634


Export citation

References

  • Akhiezer, N. I. and Glazman, I. M. (1993). Theory of Linear Operators in Hilbert Space. Dover, New York.
  • Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
  • Bai, Z. D. and Silverstein, J. W. (2007). On the signal-to-interference-ratio of CDMA systems in wireless communications. Ann. Appl. Probab. 17 81–101.
  • Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
  • Benaych-Georges, F. and Nadakuditi, R. R. (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal. 111 120–135.
  • Cambria, E., Gastaldo, P., Bisio, F. and Zunino, R. (2015). An ELM-based model for affective analogical reasoning. Neurocomputing 149 443–455.
  • Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. and LeCun, Y. (2015). The loss surfaces of multilayer networks. In AISTATS.
  • Couillet, R. and Benaych-Georges, F. (2016). Kernel spectral clustering of large dimensional data. Electron. J. Stat. 10 1393–1454.
  • Couillet, R., Hoydis, J. and Debbah, M. (2012). Random beamforming over quasi-static and fading channels: A deterministic equivalent approach. IEEE Trans. Inform. Theory 58 6392–6425.
  • Couillet, R. and Kammoun, A. (2016). Random matrix improved subspace clustering. In 2016 Asilomar Conference on Signals, Systems, and Computers.
  • Couillet, R., Pascal, F. and Silverstein, J. W. (2015). The random matrix regime of Maronna’s M-estimator with elliptically distributed samples. J. Multivariate Anal. 139 56–78.
  • El Karoui, N. (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 19 2362–2405.
  • El Karoui, N. (2010). The spectrum of kernel random matrices. Ann. Statist. 38 1–50.
  • El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. Preprint. Available at arXiv:1311.2445.
  • Giryes, R., Sapiro, G. and Bronstein, A. M. (2016). Deep neural networks with random Gaussian weights: A universal classification strategy? IEEE Trans. Signal Process. 64 3444–3457.
  • Hornik, K., Stinchcombe, M. and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks 2 359–366.
  • Huang, G.-B., Zhu, Q.-Y. and Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing 70 489–501.
  • Huang, G.-B., Zhou, H., Ding, X. and Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 513–529.
  • Jaeger, H. and Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 78–80.
  • Kammoun, A., Kharouf, M., Hachem, W. and Najim, J. (2009). A central limit theorem for the sinr at the lmmse estimator output for large-dimensional signals. IEEE Transactions on Information Theory 55 5048–5063.
  • Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 1097–1105.
  • LeCun, Y., Cortes, C. and Burges, C. (1998). The MNIST database of handwritten digits.
  • Ledoux, M. (2005). The Concentration of Measure Phenomenon 89. Amer. Math. Soc., Providence, RI.
  • Liao, Z. and Couillet, R. (2017). A large dimensional analysis of least squares support vector machines. J. Mach. Learn. Res. To appear. Available at arXiv:1701.02967.
  • Loubaton, P. and Vallet, P. (2010). Almost sure localization of the eigenvalues in a Gaussian information plus noise model. Application to the spiked models. Electron. J. Probab. 16 1934–1959.
  • Mai, X. and Couillet, R. (2017). The counterintuitive mechanism of graph-based semi-supervised learning in the big data regime. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17).
  • Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Math. USSR, Sb. 1 457–483.
  • Pastur, L. and Ŝerbina, M. (2011). Eigenvalue Distribution of Large Random Matrices. Amer. Math. Soc., Providence, RI.
  • Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 1177–1184.
  • Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65 386–408.
  • Rudelson, M., Vershynin, R. et al. (2013). Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 1–9.
  • Saxe, A., Koh, P. W., Chen, Z., Bhand, M., Suresh, B. and Ng, A. Y. (2011). On random weights and unsupervised feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 1089–1096.
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Netw. 61 85–117.
  • Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivariate Anal. 54 175–192.
  • Silverstein, J. W. and Choi, S. (1995). Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal. 54 295–309.
  • Tao, T. (2012). Topics in Random Matrix Theory 132. Amer. Math. Soc., Providence, RI.
  • Titchmarsh, E. C. (1939). The Theory of Functions. Oxford Univ. Press, New York.
  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing, 210–268, Cambridge Univ. Press, Cambridge.
  • Williams, C. K. I. (1998). Computation with infinite neural networks. Neural Comput. 10 1203–1216.
  • Yates, R. D. (1995). A framework for uplink power control in cellular radio systems. IEEE Journal on Selected Areas in Communications 13 1341–1347.
  • Zhang, T., Cheng, X. and Singer, A. (2014). Marchenko–Pastur Law for Tyler’s and Maronna’s M-estimators. Available at http://arxiv.org/abs/1401.3424.