## Electronic Journal of Statistics

### Penalized regression, mixed effects models and appropriate modelling

#### Abstract

Linear mixed effects methods for the analysis of longitudinal data provide a convenient framework for modelling within-individual correlation across time. Using spline functions allows for flexible modelling of the response as a smooth function of time. A computational connection between linear mixed effects modelling and spline smoothing has resulted in a cross-fertilization of these two fields. The connection has popularized the use of spline functions in longitudinal data analysis and the use of mixed effects software in smoothing analyses. However, care must be taken in exploiting this connection, as resulting estimates of the underlying population mean might not track the data well and associated standard errors might not reflect the true variability in the data. We discuss these shortcomings and suggest some easy-to-compute methods to eliminate them.

#### Article information

Source
Electron. J. Statist., Volume 7 (2013), 1517-1552.

Dates
First available in Project Euclid: 29 May 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1369836229

Digital Object Identifier
doi:10.1214/13-EJS809

Mathematical Reviews number (MathSciNet)
MR3066377

Zentralblatt MATH identifier
1327.62256

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62J99: None of the above, but in this section

#### Citation

Heckman, Nancy; Lockhart, Richard; Nielsen, Jason D. Penalized regression, mixed effects models and appropriate modelling. Electron. J. Statist. 7 (2013), 1517--1552. doi:10.1214/13-EJS809. https://projecteuclid.org/euclid.ejs/1369836229

#### References

• [1] B. A. Brumback, L. C. Brumback, and M. J. Lindstrom., Longitudinal Data Analysis, pages 291–318. Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G., eds. Handbooks of Modern Statistical Methods. Chapman & Hall/CRC Press, Boca Raton, Florida, 2009.
• [2] Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll, Adarsh Joshi, and Billy Goodner. Spatially adaptive Bayesian penalized splines with heteroscedastic errors., Journal of Computational and Graphical Statistics, 16(2):265–88, 2007.
• [3] Eugene Demidenko., Mixed Models: Theory and Applications. Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken, NJ, 2004.
• [4] Viani A. B. Djeundje and Iain D. Currie. Appropriate covariance-specification via penalties for penalized splines in mixed models for longitudinal data., Electronic Journal of Statistics, 4 :1202–1224, 2010.
• [5] M. Durban, J. Harezlak, M. P. Wand, and R. J. Carroll. Simple fitting of subject-specific curves for longitudinal data., Statistics in Medicine, 24(8) :1153–67, 2005.
• [6] Paul H. C. Eilers and Brian D. Marx. Flexible smoothing with $B$-splines and penalties., Statistical Science, 11(2):89–121, 1996.
• [7] Paul H. C. Eilers and Brian D. Marx., Splines, knots and penalties. Wiley Interdisciplinary Reviews: Computational Statistics. 2010.
• [8] Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware., Applied Longitudinal Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken, NJ, 2004.
• [9] D. G. Folk and T. J. Bradley. The evolution of recovery from desiccation stress in laboratory-selected populations of drosophila melanogaster., The Journal of Experimental Biology, 207 :2671–2678, 2004.
• [10] J. H. Friedman. Multivariate adaptive regression splines (with discussion)., Annals of Statististics, 19:1–141, 1991.
• [11] A. Gilmour, B. Gogel, B. R. Cullis, and R. Thompson., ASReml User Guide Release 2.0. VSN International Ltd., Hemel Hempstead, U.K., 2006.
• [12] P. J. Green and B. W. Silverman., Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1994.
• [13] N. Heckman, R. Lockhart, and J. D. Nielsen, Supplementary Material to “Regression, Mixed Effects Models and Appropriate Modelling”. DOI:, 10.1214/00-EJS809SUPP.
• [14] J. S. Hodges and D. J. Sargent. Counting degrees of freedom in hierarchical and other richly-parameterised models., Biometrika, 88:367–79, 2001.
• [15] A. E. Huisman, R. F. Veerkamp, and J. A. M. Van Arendonk. Genetic parameters for various random regression models to describe the weight data of pigs., Journal of Animal Science, 80:575–82, 2002.
• [16] Raghu N. Kackar and David A. Harville. Approximations for standard errors of estimators of fixed and random effect in mixed linear models., Journal of the American Statistical Association, 79:853–862, 1984.
• [17] George S. Kimeldorf and Grace Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines., Annals of Mathematical Statistics, 41:495–502, 1970.
• [18] Kung Yee Liang and Scott L. Zeger. Longitudinal data analysis using generalized linear models., Biometrika, 73(1):13–22, 1986.
• [19] Karin Meyer. Random regression analyses using $B$-splines to model growth of Australian Angus cattle., Genetics Selection Evolution, 37(5):473–500, 2005.
• [20] Karin Meyer. WOMBAT - a tool for mixed model analyses in quantitative genetics by REML., Journal of Zheijang University Science B, 8:815–21, 2007.
• [21] L. Ngo and M. P. Wand. Smoothing with mixed model software., Journal of Statistical Software, 9:1–54, 2004.
• [22] J. O. Ramsay and B. W. Silverman., Functional Data Analysis. Springer Series in Statistics. Springer, New York, second edition, 2005.
• [23] C. E. Rasmussen and C. K. I. Williams., Gaussian Processes for Machine Learning. The MIT Press, 2006.
• [24] John A. Rice and Colin O. Wu. Nonparametric mixed effects models for unequally sampled noisy curves., Biometrics, 57(1):253–9, 2001.
• [25] Christèle Robert-Granié, Barbara Heude, and Jean-Louis Foulley. Modelling the growth curve of Maine-Anjou beef cattle using heteroskedastic random coefficients models., Genetics Selection Evolution, 34(4):423–45, 2002.
• [26] G. K. Robinson. That BLUP is a good thing: the estimation of random effects., Statistical Science, 6(1):15–51, 1991.
• [27] David Ruppert, M. P. Wand, and R. J. Carroll., Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2003.
• [28] David Ruppert, M. P. Wand, and R. J. Carroll. Semiparametric regression during 2003-2007., Electronic Journal of Statistics, 3:1192–1256, 2010.
• [29] Andrew D. A. C. Smith and M. P. Wand. Streamlined variance calculations for semiparametric mixed models., Statistics in Medicine, 27(3):435–48, 2008.
• [30] C. J. Stone, M. Hansen, C. Kooperberg, and Y. K. Truong. Polynomial splines and their tensor products in extended linear modeling., Annals of Statististics, 25 :1371–1425, 1997.
• [31] Yan Sun, Wenyang Zhang, and Howell Tong. Estimation of the covariance matrix of random effects in longitudinal studies., The Annals of Statistics, 35(6) :2795–2814, 2007.
• [32] A. A. Szpiro, K. M. Rice, and T. Lumley. Model-robust regression and Bayesian ‘sandwich’ estimator., Annals of Applied Statistics, to appear.
• [33] A. P. Verbyla, B. R. Cullis, M. G. Kenward, and S. J. Welham. The analysis of designed experiments and longitudinal data by using smoothing splines., Journal of The Royal Statistical Society Series C, 48(3):269–311, 1999.
• [34] Sue J. Welham, Brian R. Cullis, Michael G. Kenward, and Robin Thompson. A comparison of mixed model splines for curve fitting., Australian & New Zealand Journal of Statistics, 49(1):1–23, 2007.
• [35] I. M. S. White, R. Thompson, and S. Brotherstone. Genetic and environmental smoothing of lactation curves with cubic splines., Journal of Dairy Science, 82:632–8, 1999.

#### Supplemental materials

• Supplement to “Penalized regression, mixed effects models and appropriate modelling”. The Supplementary Material includes code to obtain estimates of $\mu$ and standard errors, as described in Sections 3 and 4. Also included are: code to produce plots from the paper; code to generate simulated data and to run the simulation for our methods; code to simulate according to the methods of Djeundje and Currie [4]; details of the results of the simulation study; a description of the analysis of the Canadian weather data, assuming that $\mu$ is random, and accompanying code; results of the analysis of the fruit fly data, for both $\mu$ non-random and $\mu$ random.