Linearized two-layers neural networks in high dimension

Behrooz Ghorbani; Song Mei; Theodor Misiakiewicz; Andrea Montanari

doi:10.1214/20-AOS1990

Abstract

We consider the problem of learning an unknown function ${f_{\star }}$ on the d-dimensional sphere with respect to the square loss, given i.i.d. samples ${{({y_{i}},{\boldsymbol{x}_{i}})}_{i\le n}}$ where ${\boldsymbol{x}_{i}}$ is a feature vector uniformly distributed on the sphere and ${y_{i}}={f_{\star }}({\boldsymbol{x}_{i}})+{\varepsilon _{i}}$ . We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: the random features model of Rahimi–Recht (RF); the neural tangent model of Jacot–Gabriel–Hongler (NT). Both these models can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d.

We consider two specific regimes: the infinite-sample finite-width regime, in which $n=\infty$ while d and N are large but finite, and the infinite-width finite-sample regime in which $N=\infty$ while d and n are large but finite. In the first regime, we prove that if ${d^{\ell +\delta }}\le N\le {d^{\ell +1-\delta }}$ for small $\delta \textgreater 0$ , then RF effectively fits a degree-ℓ polynomial in the raw features, and NT fits a degree- $(\ell +1)$ polynomial. In the second regime, both RF and NT reduce to kernel methods with rotationally invariant kernels. We prove that, if the sample size satisfies ${d^{\ell +\delta }}\le n\le {d^{\ell +1-\delta }}$ , then kernel methods can fit at most a degree-ℓ polynomial in the raw features. This lower bound is achieved by kernel ridge regression, and near-optimal prediction error is achieved for vanishing ridge regularization.

Citation

Download Citation

Behrooz Ghorbani. Song Mei. Theodor Misiakiewicz. Andrea Montanari. "Linearized two-layers neural networks in high dimension." Ann. Statist. 49 (2) 1029 - 1054, April 2021. https://doi.org/10.1214/20-AOS1990

Information

Received: 1 July 2019; Revised: 1 June 2020; Published: April 2021

First available in Project Euclid: 2 April 2021

Digital Object Identifier: 10.1214/20-AOS1990

Subjects:

Primary: 62G08

Secondary: 62J07

Keywords: approximation bounds , kernel ridge regression , neural tangent kernel , random features , Two-layers neural networks

Abstract

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS