## Bernoulli

• Bernoulli
• Volume 23, Number 4A (2017), 2693-2719.

### Efficiency transfer for regression models with responses missing at random

#### Abstract

We consider independent observations on a random pair $(X,Y)$, where the response $Y$ is allowed to be missing at random but the covariate vector $X$ is always observed. We demonstrate that characteristics of the conditional distribution of $Y$ given $X$ can be estimated efficiently using complete case analysis, that is, one can simply omit incomplete cases and work with an appropriate efficient estimator which remains efficient. This means in particular that we do not have to use imputation or work with inverse probability weights. Those approaches will never be better (asymptotically) than the above complete case method.

This efficiency transfer is a general result and holds true for all regression models for which the distribution of $Y$ given $X$ and the marginal distribution of $X$ do not share common parameters. We apply it to the general homoscedastic semiparametric regression model. This includes models where the conditional expectation is modeled by a complex semiparametric regression function, as well as all basic models such as linear regression and nonparametric regression. We discuss estimation of various functionals of the conditional distribution, for example, of regression parameters and of the error distribution.

#### Article information

Source
Bernoulli, Volume 23, Number 4A (2017), 2693-2719.

Dates
Revised: September 2015
First available in Project Euclid: 9 May 2017

https://projecteuclid.org/euclid.bj/1494316829

Digital Object Identifier
doi:10.3150/16-BEJ824

Mathematical Reviews number (MathSciNet)
MR3648042

Zentralblatt MATH identifier
1382.62017

#### Citation

Müller, Ursula U.; Schick, Anton. Efficiency transfer for regression models with responses missing at random. Bernoulli 23 (2017), no. 4A, 2693--2719. doi:10.3150/16-BEJ824. https://projecteuclid.org/euclid.bj/1494316829

#### References

• [1] Bickel, P.J. (1982). On adaptive estimation. Ann. Statist. 10 647–671.
• [2] Bickel, P.J., Klaassen, C.A.J., Ritov, Y. and Wellner, J.A. (1998). Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer.
• [3] Cheng, P.E. (1994). Nonparametric estimation of mean functionals with data missing at random. J. Amer. Statist. Assoc. 89 81–87.
• [4] Chown, J. and Müller, U.U. (2013). Efficiently estimating the error distribution in nonparametric regression with responses missing at random. J. Nonparametr. Stat. 25 665–677.
• [5] Efromovich, S. (2011). Nonparametric regression with responses missing at random. J. Statist. Plann. Inference 141 3744–3752.
• [6] Forrester, J., Hooper, W., Peng, H. and Schick, A. (2003). On the construction of efficient estimators in semiparametric models. Statist. Decisions 21 109–137.
• [7] González-Manteiga, W. and Pérez-González, A. (2006). Goodness-of-fit tests for linear regression models with missing response data. Canad. J. Statist. 34 149–170.
• [8] Jin, K. (1992). Empirical smoothing parameter selection in adaptive estimation. Ann. Statist. 20 1844–1874.
• [9] Kim, K.K. and Shao, J. (2013). Statistical Methods for Handling Incomplete Data. London: Chapman & Hall/CRC.
• [10] Kiwitt, S., Nagel, E.-R. and Neumeyer, N. (2008). Empirical likelihood estimators for the error distribution in nonparametric regression models. Math. Methods Statist. 17 241–260.
• [11] Koul, H.L. (1969). Asymptotic behavior of Wilcoxon type confidence regions in multiple linear regression. Ann. Math. Stat. 40 1950–1979.
• [12] Koul, H.L., Müller, U.U. and Schick, A. (2012). The transfer principle: A tool for complete case analysis. Ann. Statist. 40 3031–3049.
• [13] Li, X. (2012). Lack-of-fit testing of a regression model with response missing at random. J. Statist. Plann. Inference 142 155–170.
• [14] Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd ed. Hoboken, NJ: Wiley.
• [15] Müller, U.U. (2009). Estimating linear functionals in nonlinear regression with responses missing at random. Ann. Statist. 37 2245–2277.
• [16] Müller, U.U., Peng, H. and Schick, A. (2015). Inference about the slope in linear regression with missing responses: An empirical likelihood approach. Unpublished manuscript.
• [17] Müller, U.U., Schick, A. and Wefelmeyer, W. (2004). Estimating linear functionals of the error distribution in nonparametric regression. J. Statist. Plann. Inference 119 75–93.
• [18] Müller, U.U., Schick, A. and Wefelmeyer, W. (2005). Weighted residual-based density estimators for nonlinear autoregressive models. Statist. Sinica 15 177–195.
• [19] Müller, U.U., Schick, A. and Wefelmeyer, W. (2006). Imputing responses that are not missing. In Probability, Statistics and Modelling in Public Health (M. Nikulin, D. Commenges and C. Huber, eds.) 350–363. New York: Springer.
• [20] Müller, U.U., Schick, A. and Wefelmeyer, W. (2007). Estimating the error distribution function in semiparametric regression. Statist. Decisions 25 1–18.
• [21] Müller, U.U., Schick, A. and Wefelmeyer, W. (2009). Estimating the error distribution function in nonparametric regression with multivariate covariates. Statist. Probab. Lett. 79 957–964.
• [22] Müller, U.U., Schick, A. and Wefelmeyer, W. (2012). Estimating the error distribution function in semiparametric additive regression models. J. Statist. Plann. Inference 142 552–566.
• [23] Müller, U.U. and Van Keilegom, I. (2012). Efficient parameter estimation in regression with missing responses. Electron. J. Stat. 6 1200–1219.
• [24] Owen, A.B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237–249.
• [25] Owen, A.B. (2001). Empirical Likelihood. Monographs on Statistics and Applied Probability 92. London: Chapman & Hall.
• [26] Robins, J.M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122–129.
• [27] Rosenbaum, P.R. and Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
• [28] Rubin, D.B. (1976). Inference and missing data. Biometrika 63 581–592.
• [29] Schick, A. (1987). A note on the construction of asymptotically linear estimators. J. Statist. Plann. Inference 16 89–105. Correction 22 (1989) 269–270.
• [30] Schick, A. (1993). On efficient estimation in regression models. Ann. Statist. 21 1486–1521.
• [31] Tsiatis, A.A. (2006). Semiparametric Theory and Missing Data. New York: Springer.
• [32] Wang, D. and Chen, S.X. (2009). Empirical likelihood for estimating equations with missing values. Ann. Statist. 37 490–517.