The Annals of Statistics

Efficient estimation in semivarying coefficient models for longitudinal/clustered data

Ming-Yen Cheng, Toshio Honda, and Jialiang Li

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In semivarying coefficient modeling of longitudinal/clustered data, of primary interest is usually the parametric component which involves unknown constant coefficients. First, we study semiparametric efficiency bound for estimation of the constant coefficients in a general setup. It can be achieved by spline regression using the true within-subject covariance matrices, which are often unavailable in reality. Thus, we propose an estimator when the covariance matrices are unknown and depend only on the index variable. First, we estimate the covariance matrices using residuals obtained from a preliminary estimation based on working independence and both spline and local linear regression. Then, using the covariance matrix estimates, we employ spline regression again to obtain our final estimator. It achieves the semiparametric efficiency bound under normality assumption and has the smallest asymptotic covariance matrix among a class of estimators even when normality is violated. Our theoretical results hold either when the number of within-subject observations diverges or when it is uniformly bounded. In addition, using the local linear estimator of the nonparametric component is superior to using the spline estimator in terms of numerical performance. The proposed method is compared with the working independence estimator and some existing method via simulations and application to a real data example.

Article information

Ann. Statist., Volume 44, Number 5 (2016), 1988-2017.

Received: June 2015
Revised: September 2015
First available in Project Euclid: 12 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Covariance matrix estimation local linear regression semiparametric efficiency bound spline functions


Cheng, Ming-Yen; Honda, Toshio; Li, Jialiang. Efficient estimation in semivarying coefficient models for longitudinal/clustered data. Ann. Statist. 44 (2016), no. 5, 1988--2017. doi:10.1214/15-AOS1385.

Export citation


  • [1] Cheng, G. and Wang, X. (2011). Semiparametric additive transformation model under current status data. Electron. J. Stat. 5 1735–1764.
  • [2] Cheng, G., Yu, Z. and Huang, J. Z. (2013). The cluster bootstrap consistency in generalized estimating equations. J. Multivariate Anal. 115 33–47.
  • [3] Cheng, G., Zhou, L. and Huang, J. Z. (2014). Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 20 141–163.
  • [4] Cheng, G., Zhou, L. and Huang, J. Z. (2014). Supplement to “Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data.” doi: 10.3150/12-BEJ479SUPP.
  • [5] Cheng, G., Zhou, L. and Huang, J. Z. (2015). Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data.” DOI:10.1214/15-AOS1385SUPP.
  • [6] Cheng, M.-Y., Honda, T., Li, J. and Peng, H. (2014). Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann. Statist. 42 1819–1849.
  • [7] Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. J. Amer. Statist. Assoc. 102 632–641.
  • [8] Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
  • [9] Fan, J., Ma, Y. and Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Amer. Statist. Assoc. 109 1270–1284.
  • [10] Fan, J. and Wu, Y. (2008). Semiparametric estimation of covariance matrixes for longitudinal data. J. Amer. Statist. Assoc. 103 1520–1533.
  • [11] Henry, K., Erice, A., Tierney, C., Balfour, H. H. J., Fischl, M. A., Kmack, A., Liou, S. H., Kenton, A., Hirsch, M. S., Phair, J., Martinez, A., Kahn, J. O. and for the AIDS Clinical Trial Group 193A Study Team (1998). A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. J. Acquir. Immune Defic. Syndr. Human Retrovirol. 19 339–349.
  • [12] Huang, J. Z., Wu, C. O. and Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statist. Sinica 14 763–788.
  • [13] Huang, J. Z., Zhang, L. and Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scand. J. Stat. 34 451–477.
  • [14] Li, Y. (2011). Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation. Biometrika 98 355–370.
  • [15] Lin, X. and Carroll, R. J. (2006). Semiparametric estimation in general repeated measures problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 69–88.
  • [16] Lin, X., Wang, N., Welsh, A. H. and Carroll, R. J. (2004). Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data. Biometrika 91 177–193.
  • [17] Ma, S. (2012). Two-step spline estimating equations for generalized additive partially linear models with large cluster sizes. Ann. Statist. 40 2943–2972.
  • [18] Qu, A. and Li, R. (2006). Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics 62 379–391.
  • [19] Schumaker, L. L. (2007). Spline Functions: Basic Theory, 3rd ed. Cambridge Mathematical Library. Cambridge Univ. Press, Cambridge.
  • [20] Shen, S.-L., Cui, J.-L., Mei, C.-L. and Wang, C.-W. (2014). Estimation and inference of semi-varying coefficient models with heteroscedastic errors. J. Multivariate Anal. 124 70–93.
  • [21] Tian, R., Xue, L. and Liu, C. (2014). Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data. J. Multivariate Anal. 132 94–110.
  • [22] Wang, L. and Qu, A. (2009). Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 177–190.
  • [23] Wang, N., Carroll, R. J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147–157.
  • [24] Wu, H. and Zhang, J. T. (2006). Nonparametric Regression Methods for Longitudinal Data: Mixed-Effects Modeling Approaches. Wiley, New York.
  • [25] Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika 91 661–681.
  • [26] Yao, W. and Li, R. (2013). New local estimation procedure for a non-parametric regression function for longitudinal data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 123–138.
  • [27] Zhang, W., Fan, J. and Sun, Y. (2009). A semiparametric model for cluster data. Ann. Statist. 37 2377–2408.
  • [28] Zhou, J. and Qu, A. (2012). Informative estimation and selection of correlation structure for longitudinal data. J. Amer. Statist. Assoc. 107 701–710.

Supplemental materials

  • Additional simulation results and technical material. Additional simulation results, proofs of the propositions and lemmas, and theory for the case of uniformly bounded cluster size and general link function.