The Annals of Statistics

Local linear regression for generalized linear models with missing data

R. J. Carroll, Roberto G. Gutierrez, C. Y. Wang, and Suojin Wang

Full-text: Open access


Fan, Heckman and Wand proposed locally weighted kernel polynomial regression methods for generalized linear models and quasilikelihood functions. When the covariate variables are missing at random, we propose a weighted estimator based on the inverse selection probability weights. Distribution theory is derived when the selection probabilities are estimated nonparametrically. We show that the asymptotic variance of the resulting nonparametric estimator of the mean function in the main regression model is the same as that when the selection probabilities are known, while the biases are generally different. This is different from results in parametric problems, where it is known that estimating weights actually decreases asymptotic variance. To reconcile the difference between the parametric and nonparametric problems, we obtain a second-order variance result for the nonparametric case. We generalize this result to local estimating equations. Finite-sample performance is examined via simulation studies. The proposed method is demonstrated via an analysis of data from a case-control study.

Article information

Ann. Statist., Volume 26, Number 3 (1998), 1028-1050.

First available in Project Euclid: 21 June 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation
Secondary: 62G20

Generalized linear models kernel regression local linear smoother measurement error missing at random quasilikelihood functions


Wang, C. Y.; Wang, Suojin; Gutierrez, Roberto G.; Carroll, R. J. Local linear regression for generalized linear models with missing data. Ann. Statist. 26 (1998), no. 3, 1028--1050. doi:10.1214/aos/1024691087.

Export citation


  • Bruemmer, B., White, E., Vaughan, T. and Cheney, C. (1996). Nutrient intake in relationship to bladder cancer among middle aged men and women. Amer. J. Epidemiology 144 485-495.
  • Carroll, R. J., Ruppert, D. and Welsh, A. H. (1998). Local estimating equations. J. Amer. Statist. Assoc., 93 214-227.
  • Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477-489.
  • Fan, J., Heckman, N. E. and Wand, M. P. (1995). Local poly nomial kernel regression for generalized linear models and quasilikelihood functions. J. Amer. Statist. Assoc. 90 141-150.
  • Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663-685.
  • Little, R. J. A. and Rubin, D. B. (1987). Statistical Analy sis with Missing Data. Wiley, New York.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London. Nelder, J. A. and Wedderburn, R. W. M. (1972), Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370-384.
  • Prentice, R. L. and Py ke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403-411.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not alway s observed. J. Amer. Statist. Assoc. 89 846-866.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581-592.
  • Schucany, W. R. (1995). Adaptive bandwidth choice for kernel regression. J. Amer. Statist. Assoc. 90 535-540.
  • Severini, T. A. and Staniswalis, J. G. (1994). Quasilikelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89 501-511.
  • Staniswalis, J. G. (1989). The kernel estimate of a regression function in likelihood-based models. J. Amer. Statist. Assoc. 84 276-283.
  • Wang, C. Y., Wang, S., Zhao, L. P. and Ou, S. T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. J. Amer. Statist. Assoc. 92 512-525.
  • Wedderburn, R. W. M. (1974). Quasilikelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61 439-447.
  • White, J. E. (1982). A two stage design for the study of the relationship between a rare exposure and a rare disease. Amer. J. Epidemiology 115 119-128.