Statistical Science

Model Selection in Linear Mixed Models

Samuel Müller, J. L. Scealy, and A. H. Welsh

Full-text: Open access


Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5–10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.

Article information

Statist. Sci., Volume 28, Number 2 (2013), 135-167.

First available in Project Euclid: 21 May 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

AIC Bayes factor BIC Cholesky decomposition fence information criteria LASSO linear mixed model model selection shrinkage methods


Müller, Samuel; Scealy, J. L.; Welsh, A. H. Model Selection in Linear Mixed Models. Statist. Sci. 28 (2013), no. 2, 135--167. doi:10.1214/12-STS410.

Export citation


  • Ahn, M., Zhang, H. H. and Lu, W. (2012). Moment-based method for random effects selection in linear mixed models. Statist. Sinica 22 1539–1562.
  • Aitkin, M., Liu, C. C. and Chadwick, T. (2009). Bayesian model comparison and model averaging for small-area estimation. Ann. Appl. Stat. 3 199–221.
  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) (B. N. Petrov and F. Csaki, eds.) 267–281. Akadémiai Kiadó, Budapest.
  • Bondell, H. D., Krishna, A. and Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66 1069–1077.
  • Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 52 345–370.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed. Springer, New York.
  • Burnham, K. P. and White, G. C. (2002). Evaluation of some random effects methodology applicable to bird ringing data. J. Appl. Stat. 29 245–266.
  • Chen, Z. and Dunson, D. B. (2003). Random effects selection in linear mixed models. Biometrics 59 762–769.
  • Chib, S. (1995). Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc. 90 1313–1321.
  • Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge Univ. Press, Cambridge.
  • Dimova, R. B., Markatou, M. and Talal, A. H. (2011). Information methods for model selection in linear mixed effects models with application to HCV data. Comput. Statist. Data Anal. 55 2677–2697.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, Y. and Li, R. (2012). Variable selection in linear mixed effects models. Ann. Statist. 40 2043–2068.
  • Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • Fang, Y. (2011). Asymptotic equivalence between cross-validations and Akaike information criteria in mixed-effects models. J. Data Sci. 9 15–21.
  • Fay, R. E. III and Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • Field, C. A., Pang, Z. and Welsh, A. H. (2010). Bootstrapping robust estimates for clustered data. J. Amer. Statist. Assoc. 105 1606–1616.
  • Field, C. A. and Welsh, A. H. (2007). Bootstrapping clustered data. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 369–390.
  • Foster, S. D., Verbyla, A. P. and Pitchford, W. S. (2007). Incorporating LASSO effects into a mixed model for quantitative trait loci detection. J. Agric. Biol. Environ. Stat. 12 300–314.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • Gelman, A., Robert, C. P. and Rousseau, J. (2010). Do we need an integrated Bayesian/likelihood inference? Available at arXiv:1012.2184v1.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika 97 773–789.
  • Han, C. and Carlin, B. P. (2001). Markov Chain Monte Carlo methods for computing Bayes factors: A comparative review. J. Amer. Statist. Assoc. 96 1122–1132.
  • Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41 190–195.
  • Henderson, C. R. (1950). Estimation of genetic parameters. Ann. Math. Statist. 21 309–310.
  • Hodges, J. S. and Sargent, D. J. (2001). Counting degrees of freedom in hierarchical and other richly-parameterised models. Biometrika 88 367–379.
  • Hoeting, J. A., Davis, R. A., Merton, A. A. and Thompson, S. E. (2006). Model selection for geostatistical models. Ecolog. Appl. 16 87–98.
  • Ibrahim, J. G., Zhu, H., Garcia, R. I. and Guo, R. (2011). Fixed and random effects selection in mixed effects models. Biometrics 67 495–503.
  • Jiang, J. (2007). Linear and Generalized Linear Mixed Models and Their Applications. Springer, New York.
  • Jiang, J., Luan, Y. and Wang, Y.-G. (2007). Iterative estimating equations: Linear convergence and asymptotic properties. Ann. Statist. 35 2233–2260.
  • Jiang, J., Nguyen, T. and Rao, J. S. (2009). A simplified adaptive fence procedure. Statist. Probab. Lett. 79 625–629.
  • Jiang, J., Nguyen, T. and Rao, J. S. (2011). Invisible fence methods and the identification of differentially expressed gene sets. Stat. Interface 4 403–415.
  • Jiang, J. and Rao, J. S. (2003). Consistent procedures for mixed linear model selection. Sankhyā 65 23–42.
  • Jiang, J., Rao, J. S., Gu, Z. and Nguyen, T. (2008). Fence methods for mixed model selection. Ann. Statist. 36 1669–1692.
  • Jones, R. H. (2011). Bayesian information criterion for longitudinal and clustered data. Stat. Med. 30 3050–3056.
  • Kubokawa, T. (2011). Conditional and unconditional methods for selecting variables in linear mixed models. J. Multivariate Anal. 102 641–660.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Liang, H., Wu, H. and Zou, G. (2008). A note on conditional AIC for linear mixed-effects models. Biometrika 95 773–778.
  • Liski, E. P. and Liski, A. (2008). Model selection in linear mixed models using MDL criterion with an application to spline smoothing. In Proceedings of the First Workshop on Information Theoretic Methods in Science and Engineering, Tampere, Finland, 18–20 August, 2008 (J. Heikkonen et al., eds.).
  • McCulloch, C. E. (2003). Generalized Linear Mixed Models. NSF-CBMS Regional Conference Series in Probability and Statistics 7. IMS, Beachwood, OH.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
  • Moody, J. E. (1992). The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. In Advances in Neural Information Processing Systems 4 (J. E. Moody, S. J. Hanson and R. P. Lippmann, eds.) 847–854. Morgan Kaufmann, San Mateo.
  • Müller, S. and Welsh, A. H. (2005). Outlier robust model selection in linear regression. J. Amer. Statist. Assoc. 100 1297–1310.
  • Müller, S. and Welsh, A. H. (2009). Robust model selection in generalized linear models. Statist. Sinica 19 1155–1170.
  • Müller, S. and Welsh, A. H. (2010). On model selection curves. Intnl. Statist. Reviews 78 240–256.
  • Murata, N., Yoshizawa, S. and Amari, S. (1994). Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Trans. Neural. Netw. 5 865–872.
  • Nguyen, T. and Jiang, J. (2012). Restricted fence method for covariate selection in longitudinal studies. Biostatistics 13 303–314.
  • Ni, X., Zhang, D. and Zhang, H. H. (2010). Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66 79–88.
  • Patterson, H. D. and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58 545–554.
  • Pauler, D. K. (1998). The Schwarz criterion and related methods for normal linear models. Biometrika 85 13–27.
  • Pauler, D. K., Wakefield, J. C. and Kass, R. E. (1999). Bayes factors and approximations for variance component models. J. Amer. Statist. Assoc. 94 1242–1253.
  • Peng, H. and Lu, Y. (2012). Model selection in linear mixed effect models. J. Multivariate Anal. 109 109–129.
  • Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and Splus. Springer, New York.
  • Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statist. Sci. 26 369–387.
  • Pu, W. and Niu, X.-F. (2006). Selecting mixed-effects models based on a generalized information criterion. J. Multivariate Anal. 97 733–758.
  • Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press, Cambridge.
  • Rissanen, J. (2007). Information and Complexity in Statistical Modeling. Springer, New York.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Saville, B. R. and Herring, A. H. (2009). Testing random effects in the linear mixed model using approximate Bayes factors. Biometrics 65 369–376.
  • Saville, B. R., Herring, A. H. and Kaufman, J. S. (2011). Assessing variance components in multilevel linear models using approximate Bayes factors: A case-study of ethnic disparities in birth weight. J. Roy. Statist. Soc. Ser. A 174 785–804.
  • Schelldorfer, J., Bühlmann, P. and van de Geer, S. (2011). Estimation for high-dimensional linear mixed-effects models using $\ell_{1}$-penalization. Scand. J. Stat. 38 197–214.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Shang, J. and Cavanaugh, J. E. (2008). Bootstrap variants of the Akaike information criterion for mixed model selection. Comput. Statist. Data Anal. 52 2004–2021.
  • Snijders, T. A. B. and Bosker, R. J. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Sage Publications, London.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Srivastava, M. S. and Kubokawa, T. (2010). Conditional information criteria for selecting variables in linear mixed models. J. Multivariate Anal. 101 1970–1980.
  • Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J. Roy. Statist. Soc. Ser. B 39 44–47.
  • Sugiura, N. (1978). Further analysis of the data by Akaike’s information criterion and the finite corrections. Comm. Statist. A 7 13–26.
  • Takeuchi, K. (1976). Distribution of informational statistics and a criterion for model fitting. Suri-Kagaku 153 12–18. (in Japanese.)
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 273–282.
  • Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika 92 351–370.
  • Wang, D., Eskridge, K. M. and Crossa, J. (2011). Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J. Agric. Biol. Environ. Stat. 16 170–184.
  • Wu, H. and Zhang, J.-T. (2002). Local polynomial mixed-effects models for longitudinal data. J. Amer. Statist. Assoc. 97 883–897.
  • Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.