• Bernoulli
  • Volume 19, Number 4 (2013), 1449-1464.

The geometry of least squares in the 21st century

Jonathan Taylor

Full-text: Open access


It has been over 200 years since Gauss’s and Legendre’s famous priority dispute on who discovered the method of least squares. Nevertheless, we argue that the normal equations are still relevant in many facets of modern statistics, particularly in the domain of high-dimensional inference. Even today, we are still learning new things about the law of large numbers, first described in Bernoulli’s Ars Conjectandi 300 years ago, as it applies to high dimensional inference.

The other insight the normal equations provide is the asymptotic Gaussianity of the least squares estimators. The general form of the Gaussian distribution, Gaussian processes, are another tool used in modern high-dimensional inference. The Gaussian distribution also arises via the central limit theorem in describing weak convergence of the usual least squares estimators. In terms of high-dimensional inference, we are still missing the right notion of weak convergence.

In this mostly expository work, we try to describe how both the normal equations and the theory of Gaussian processes, what we refer to as the “geometry of least squares,” apply to many questions of current interest.

Article information

Bernoulli, Volume 19, Number 4 (2013), 1449-1464.

First available in Project Euclid: 27 August 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

convex analysis Gaussian processes least squares penalized regression


Taylor, Jonathan. The geometry of least squares in the 21st century. Bernoulli 19 (2013), no. 4, 1449--1464. doi:10.3150/12-BEJSP15.

Export citation


  • Adler, R.J., Samorodnitsky, G. and Taylor, J.E. (2010). Excursion sets of three classes of stable random fields. Adv. in Appl. Probab. 42 293–318.
  • Adler, R.J., Samorodnitsky, G. and Taylor, J.E. (2013). High level excursion set geometry for non-Gaussian infinitely divisible random fields. Ann. Probab. 41 134–169.
  • Adler, R.J. and Taylor, J.E. (2007). Random Fields and Geometry. Springer Monographs in Mathematics. New York: Springer.
  • Amari, S.I. and Nagaoka, H. (2000). Methods of Information Geometry. Translations of Mathematical Monographs 191. Providence, RI: Amer. Math. Soc. Translated from the 1993 Japanese original by Daishi Harada.
  • Azaïs, J.M. and Wschebor, M. (2008). A general expression for the distribution of the maximum of a Gaussian field and the approximation of the tail. Stochastic Process. Appl. 118 1190–1218.
  • Becker, S., Bobin, J. and Candès, E.J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1–39.
  • Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bien, J., Taylor, J. and Tibshirani, R. (2013). A lasso for hierarchical interactions. Ann. Statist. To appear. Available at
  • Boyd, S., Parikh, N. and Chu, E. (2011). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Hanover: Now Publishers.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge Univ. Press.
  • Bühlmann, P. (2012). Statistical significance in high-dimensional linear models. Available at
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • Candès, E.J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • Candès, E.J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
  • Donoho, D.L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • Donoho, D.L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914–18919.
  • Efron, B. (1978). The geometry of exponential families. Ann. Statist. 6 362–376.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. With discussion, and a rejoinder by the authors.
  • Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2009). glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1.1-3.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2011). Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12 2297–2334.
  • Laber, E.B. and Murphy, S.A. (2011). Adaptive confidence intervals for the test error in classification. J. Amer. Statist. Assoc. 106 904–913.
  • Lockhart, R., Taylor, J., Tibshirani, R. and Tibshirani, R. (2013). A significance test for the lasso. Available at
  • Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287–2322.
  • Meinshausen, N. (2012). Sign-constrained least squares estimation for high-dimensional regression. Available at
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
  • Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371–1382.
  • Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for high-dimensional analysis of MM-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Math. Program. 103 127–152.
  • Obozinski, G., Jacob, L. and Vert, J.P. (2011). Group lasso with overlaps: The latent group lasso approach. Available at
  • Obozinski, G., Wainwright, M.J. and Jordan, M.I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
  • Schneider, R. (1993). Convex Bodies: The Brunn–Minkowski Theory. Encyclopedia of Mathematics and Its Applications 44. Cambridge: Cambridge Univ. Press.
  • Siegmund, D. and Zhang, H. (1993). The expected number of local maxima of a random field and the volume of tubes. Ann. Statist. 21 1948–1966.
  • Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151.
  • Stigler, S.M. (1981). Gauss and the invention of least squares. Ann. Statist. 9 465–474.
  • Sun, J. (1993). Tail probabilities of the maxima of Gaussian random fields. Ann. Probab. 21 34–71.
  • Takemura, A. and Kuriki, S. (1997). Weights of $\overline{\chi}^{2}$ distribution for smooth or piecewise smooth cone alternatives. Ann. Statist. 25 2368–2387.
  • Takemura, A. and Kuriki, S. (2002). On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12 768–796.
  • Taylor, J.E. (2006). A Gaussian kinematic formula. Ann. Probab. 34 122–158.
  • Taylor, J., Takemura, A. and Adler, R.J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362–1396.
  • Taylor, J.E. and Tibshirani, R.J. (2013). Estimation error bounds for convex problems with geometrically decomposable penalties. Unpublished manuscript.
  • Taylor, J.E. and Vadlamani, S. (2013). Random fields and the geometry of Wiener space. Ann. Probab. To appear. Available at
  • Taylor, J.E. and Worsley, K.J. (2007). Detecting sparse signals in random fields, with an application to brain mapping. J. Amer. Statist. Assoc. 102 913–928.
  • Taylor, J.E. and Worsley, K.J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1–27.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Tibshirani, R.J. (2012). The lasso problem and uniqueness. Available at
  • Tibshirani, R.J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
  • Tibshirani, R.J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198–1232.
  • Tseng, P. (2013). On accelerated proximal gradient methods for convex-concave optimization submitted to siam. J. Optim. To appear.
  • Tsirel’son, B.S. (1982). A geometric approach to maximum likelihood estimation for an infinite-dimensional Gaussian location. I. Teor. Veroyatn. Primen. 27 388–395.
  • Vitale, R.A. (2001). Intrinsic volumes and Gaussian processes. Adv. in Appl. Probab. 33 354–364.
  • Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • Weyl, H. (1939). On the volume of tubes. Amer. J. Math. 61 461–472.
  • Worsley, K.J. (1995). Boundary corrections for the expected Euler characteristic of excursion sets of random fields, with an application to astrophysics. Adv. in Appl. Probab. 27 943–959.
  • Worsley, K.J., Marrett, S., Neelin, P., Vandal, A.C., Friston, K.J. and Evans, A.C. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping 4 58–73.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.