Statistical Science

Covariance Estimation: The GLM and Regularization Perspectives

Mohsen Pourahmadi

Full-text: Open access


Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.

Article information

Statist. Sci., Volume 26, Number 3 (2011), 369-387.

First available in Project Euclid: 31 October 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian estimation Cholesky decomposition dependence and correlation graphical models longitudinal data parsimony penalized likelihood precision matrix sparsity spectral decomposition variance-correlation decomposition


Pourahmadi, Mohsen. Covariance Estimation: The GLM and Regularization Perspectives. Statist. Sci. 26 (2011), no. 3, 369--387. doi:10.1214/11-STS358.

Export citation


  • Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1 135–141.
  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statist. Sinica 10 1281–1311.
  • Bartlett, M. S. (1933). On the theory of statistical regression. Proc. Roy. Soc. Edinburgh 53 260–283.
  • Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes,’ and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bilmes, J. A. (2000). Factored sparse inverse covariance matrices. In IEEE International Conference on Accoustics, Speech and Signal Processing (Istanbul, Turkey) 2 II1009–II1012.
  • Boik, R. J. (2002). Spectral models for covariance matrices. Biometrika 89 159–182.
  • Bondell, H. D., Krishna, A. and Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66 1069–1077.
  • Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). Time Series Analysis: Forecasting and Control, 3rd ed. Prentice Hall, Englewood Cliffs, NJ.
  • Brown, P. J., Le, N. D. and Zidek, J. V. (1994). Inference for a covariance matrix. In Aspects of Uncertainty (P. R. Freeman and A. F. M. Smith, eds.) 77–92. Wiley, Chichester.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Cannon, M. J., Warner, L., Taddei, J. A. and Kleinbaum, D. G. (2001). What can go wrong when you assume that correlated data are independent: An illustration from the evaluation of a childhood health intervention in Brazil. Statist. Med. 20 1461–1467.
  • Carroll, R. J. (2003). Variances are not always nuisance parameters. Biometrics 59 211–220.
  • Carroll, R. J. and Ruppert, D. (1988). Transformation and Weighting in Regression. Chapman & Hall, New York.
  • Chang, C. and Tsay, R. S. (2010). Estimation of covariance matrix via the sparse Cholesky factor with lasso. J. Statist. Plann. Inference 140 3858–3873.
  • Chen, Z. and Dunson, D. B. (2003). Random effects selection in linear mixed models. Biometrics 59 762–769.
  • Chiu, T. Y. M., Leonard, T. and Tsui, K.-W. (1996). The matrix-logarithmic covariance model. J. Amer. Statist. Assoc. 91 198–210.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data, rev ed. Wiley, New York.
  • Daniels, M. J. (2005). A class of shrinkage priors for the dependence structure in longitudinal data. J. Statist. Plann. Inference 127 119–130.
  • Daniels, M. J. and Hogan, J. W. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Monographs on Statistics and Applied Probability 109. Chapman & Hall/CRC, Boca Raton, FL.
  • Daniels, M. J. and Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Amer. Statist. Assoc. 94 1254–1263.
  • Daniels, M. J. and Kass, R. E. (2001). Shrinkage estimators for covariance matrices. Biometrics 57 1173–1184.
  • Daniels, M. J. and Pourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89 553–566.
  • Daniels, M. J. and Pourahmadi, M. (2009). Modeling covariance matrices via partial autocorrelations. J. Multivariate Anal. 100 2352–2363.
  • Dégerine, S. and Lambert-Lacroix, S. (2003). Partial autocorrelation function of a nonstationary time series J. Multivariate Anal. 89 135–147.
  • Dempster, A. (1972). Covariance selection models. Biometrics 28 157–175.
  • Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581–1591.
  • Diggle, P., Liang, K. Y., Zeger, S. L. and Heagerty, P. J. (2002). Analysis of Longitudinal Data, 2nd ed. Clarendon Press, Oxford.
  • Eaves, D. and Chang, T. (1992). Priors for ordered conditional variance and vector partial correlation. J. Multivariate Anal. 41 43–55.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • El Karoui, N. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • El Karoui, N. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
  • Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Statist. 3 521–541.
  • Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. J. Amer. Statist. Assoc. 102 632–641.
  • Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G., eds. (2009). Longitudinal Data Analysis. CRC Press, Boca Raton, FL.
  • Flury, B. (1988). Common Principal Components and Related Multivariate Models. Wiley, New York.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Applications of the lasso and grouped lasso to the estimation of sparse graphical models. Technical report, Stanford Univ.
  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • Gabriel, K. R. (1962). Ante-dependence analysis of an ordered set of variables. Ann. Math. Statist. 33 201–212.
  • Garthwaite, P. H. and Al-Awadhi, S. A. (2001). Non-conjugate prior distribution assessment for multivariate normal sampling. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 95–110.
  • Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Series in the Mathematical Sciences 3. Johns Hopkins Univ. Press, Baltimore, MD.
  • Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586–597.
  • Haff, L. R. (1991). The variational form of certain Bayes estimators. Ann. Statist. 19 1163–1190.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Hoff, P. D. (2009). A hierarchical eigenmodel for pooled covariance estimation. J. Roy. Statist. Soc. Ser. B 71 971–992.
  • Hoff, P. D. and Niu, X. (2009). A covariance regression model. Technical report, Univ. Washington.
  • Huang, J. Z., Liu, L. and Liu, N. (2007). Estimation of large covariance matrices of longitudinal data with basis function approximations. J. Comput. Graph. Statist. 16 189–209.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. Probab. I 361–379. Univ. California Press, Berkeley.
  • Jiang, G., Sarkar, S. K. and Hsuan, F. (1999). A likelihood ratio test and its modifications for the homogeneity of the covariance matrices of dependent multivariate normals. J. Statist. Plann. Inference 81 95–111.
  • Joe, H. (2006). Generating random correlation matrices based on partial correlations. J. Multivariate Anal. 97 2177–2189.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22 389–395.
  • Jones, M. C. (1987). Randomly choosing parameters from the stationarity and invertibility region of autoregressive-moving average models. J. Roy. Statist. Soc. Ser. C 36 134–138.
  • Jong, J.-C. and Kotz, S. (1999). On a relation between principal components and regression analysis. Amer. Statist. 53 349–351.
  • Kalman, A. E. (1960). A new approach to linear filtering and prediction problems. Trans. Amer. Soc. Mech. Eng.—J. Basic Engineering 82 35–45.
  • Kaufman, C. G., Schervish, M. J. and Nychka, W. (2008). Covariance tapering for likelihood-based estimation in large data sets. J. Amer. Statist. Assoc. 103 145–155.
  • Kurowicka, D. and Cooke, R. (2003). A parameterization of positive definite matrices in terms of partial correlation vines. Linear Algebra Appl. 372 225–251.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Ledoit, O., Santa-Clara, P. and Wolf, M. (2003). Flexible multivariate GARCH modeling with an application to international stock markets. Rev. Econom. Statist. 85 735–747.
  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • Leng, C., Zhang, W. and Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. J. Amer. Statist. Assoc. 105 181–193.
  • Leonard, T. and Hsu, J. S. J. (1992). Bayesian inference for a covariance matrix. Ann. Statist. 20 1669–1696.
  • LeSage, J. P. and Pace, R. K. (2007). A matrix exponential spatial specification. J. Econometrics 140 190–214.
  • Leung, P. L. and Muirhead, R. J. (1987). Estimation of parameter matrices and eigenvalues in MANOVA and canonical correlation analysis. Ann. Statist. 15 1651–1666.
  • Levina, E., Rothman, A. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Statist. 2 245–263.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Liechty, J. C., Liechty, M. W. and Müller, P. (2004). Bayesian correlation estimation. Biometrika 91 1–14.
  • Lin, T. I. (2011). A Bayesian inference in joint modelling of location and scale parameters of the t distribution for longitudinal data. J. Statist. Plann. Inference 141 1543–1553.
  • Lin, S. P. and Perlman, M. D. (1985). A Monte Carlo comparison of four estimators of a covariance matrix. In Multivariate Analysis VI (Pittsburgh, PA, 1983) 411–429. North-Holland, Amsterdam.
  • Lin, T.-I. and Wang, Y.-J. (2009). A robust approach to joint modeling of mean and scale covariance for longitudinal data. J. Statist. Plann. Inference 139 3013–3026.
  • Liu, C. (1993). Bartlett’s decomposition of the posterior distribution of the covariance for normal monotone ignorable missing data. J. Multivariate Anal. 46 198–206.
  • Liu, X. and Daniels, M. J. (2006). A new algorithm for simulating a correlation matrix based on parameter expansion and reparameterization. J. Comput. Graph. Statist. 15 897–914.
  • McMurry, T. L. and Politis, D. N. (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J. Time Series Anal. 31 471–482.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Pan, J. and Mackenzie, G. (2003). On modelling mean-covariance structures in longitudinal studies. Biometrika 90 239–244.
  • Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.
  • Pinheiro, J. D. and Bates, D. M. (1996). Unconstrained parameterizations for variance–covariance matrices. Stat. Comput. 6 289–366.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86 677–690.
  • Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 87 425–435.
  • Pourahmadi, M. (2001). Foundations of Time Series Analysis and Prediction Theory. Wiley, New York.
  • Pourahmadi, M. (2007a). Cholesky decompositions and estimation of a multivariate normal covariance matrix: Parameter orthogonality. Biometrika 94 1006–1013.
  • Pourahmadi, M. (2007b). Simultaneous modeling of covariance matrices: GLM, Bayesian and nonparametric perspective. In Correlated Data Modelling 2004 (D. Gregori et al., eds.) 41–64. FrancoAngeli, Milan, Italy.
  • Pourahmadi, M. and Daniels, M. J. (2002). Dynamic conditionally linear mixed models for longitudinal data. Biometrics 58 225–231.
  • Quenouille, M. H. (1949). Approximate tests of correlation in time-series. J. Roy. Statist. Soc. Ser. B 11 68–84.
  • Rajaratnam, B., Massam, H. and Carvalho, C. M. (2008). Flexible covariance estimation in graphical Gaussian models. Ann. Statist. 36 2818–2849.
  • Ramsey, F. L. (1974). Characterization of the partial autocorrelation function. Ann. Statist. 2 1296–1301.
  • Rocha, G. V., Zhao, P. and Yu, B. (2008). A path following algorithm for sparse pseudo-likelihood inverse covariance estimation (splice). Technical Report 759, Dept. Statistics, Univ. California, Berkeley.
  • Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
  • Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
  • Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • Roy, J. (1958). Step-down procedure in multivariate analysis. Ann. Math. Statist. 29 1177–1187.
  • Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
  • Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Amer. Statist. Assoc. 97 1141–1153.
  • Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proc. Third Berkeley Symp. Math. Statist. Probab. I 197–206. Univ. California Press, Berkeley.
  • Stein, C. (1975). Estimation of a covariance matrix. In Rietz Lecture. 39th Annual Meeting IMS. Atlanta, Georgia.
  • Szatrowski, T. H. (1980). Necessary and sufficient conditions for explicit solutions in the multivariate normal estimation problem for patterned means and covariances. Ann. Statist. 8 802–810.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Wagaman, A. S. and Levina, E. (2009). Discovering sparse covariance structures with the Isomap. J. Comput. Graph. Statist. 18 551–572.
  • Warton, D. I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Amer. Statist. Assoc. 103 340–349.
  • Wermuth, N. (1980). Linear recursive equations, covariance selection, and path analysis. J. Amer. Statist. Assoc. 75 963–972.
  • Witten, D. M. and Tibshirani, R. (2009). Covariance-regularized regression and classification for high-dimensional problems. J. Roy. Statist. Soc. Ser. B 71 615–636.
  • Wold, H. O. A. (1960). A generalization of causal chain models. Econometrica 28 443–463.
  • Wong, F., Carter, C. K. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90 809–830.
  • Wright, S. (1934). The method of path coefficients. Ann. Math. Statist. 5 161–215.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
  • Wu, W. B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes. Statist. Sinica 19 1755–1768.
  • Yang, R.-Y. and Berger, J. O. (1994). Estimation of a covariance matrix using the reference prior. Ann. Statist. 22 1195–1211.
  • Yuan, M. and Huang, J. Z. (2009). Regularized parameter estimation of high dimensional t distribution. J. Statist. Plann. Inference 139 2284–2292.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Yule, G. U. (1907). On the theory of correlation for any number of variables, treated by a new system of notation. Roy. Soc. Proc. 79 85–96.
  • Yule, G. U. (1927). On a model of investigating periodicities in disturbed series with special reference to Wolfer’s sunspot numbers. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 226 267–298.
  • Zimmerman, D. L. (2000). Viewing the correlation structure of longitudinal data through a PRISM. Amer. Statist. 54 310–318.
  • Zimmerman, D. L. and Núñez-Antón, V. (2001). Parametric modelling of growth curve data: An overview (with discussion). Test 10 1–73.
  • Zimmerman, D. L. and Núñez-Antón, V. A. (2010). Antedependence Models for Longitudinal Data. Monographs on Statistics and Applied Probability 112. CRC Press, Boca Raton, FL.
  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.