The Annals of Statistics

Nonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficients

Ying Zhu

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider a two-step projection based Lasso procedure for estimating a partially linear regression model where the number of coefficients in the linear component can exceed the sample size and these coefficients belong to the $l_{q}$-“balls” for $q\in[0,1]$. Our theoretical results regarding the properties of the estimators are nonasymptotic. In particular, we establish a new nonasymptotic “oracle” result: Although the error of the nonparametric projection per se (with respect to the prediction norm) has the scaling $t_{n}$ in the first step, it only contributes a scaling $t_{n}^{2}$ in the $l_{2}$-error of the second-step estimator for the linear coefficients. This new “oracle” result holds for a large family of nonparametric least squares procedures and regularized nonparametric least squares procedures for the first-step estimation and the driver behind it lies in the projection strategy. We specialize our analysis to the estimation of a semiparametric sample selection model and provide a simple method with theoretical guarantees for choosing the regularization parameter in practice.

Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 2274-2298.

Dates
Received: October 2015
Revised: November 2016
First available in Project Euclid: 31 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1509436835

Digital Object Identifier
doi:10.1214/16-AOS1528

Mathematical Reviews number (MathSciNet)
MR3718169

Zentralblatt MATH identifier
06821126

Subjects
Primary: 62J02: General nonlinear regression
Secondary: 62N01: Censored data models 62N02: Estimation 62G08: Nonparametric regression 62J12: Generalized linear models

Keywords
High-dimensional statistics Lasso nonasymptotic analysis partially linear models sample selection

Citation

Zhu, Ying. Nonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficients. Ann. Statist. 45 (2017), no. 5, 2274--2298. doi:10.1214/16-AOS1528. https://projecteuclid.org/euclid.aos/1509436835


Export citation

References

  • Ahn, H. and Powell, J. L. (1993). Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J. Econometrics 58 3–29.
  • Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bobkov, S. G. and Ledoux, M. (2000). From Brunn–Minkowski to Brascamp–Lieb and to logarithmic Sobolev inequalities. Geom. Funct. Anal. 10 1028–1052.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Bunea, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regression. Ann. Statist. 32 898–927.
  • Bunea, F. and Wegkamp, M. H. (2004). Two-stage model selection procedures in partially linear regression. Canad. J. Statist. 32 105–118.
  • Chen, H. (1988). Convergence rates for parametric components in a partly linear model. Ann. Statist. 16 136–146.
  • Donald, S. G. and Newey, W. K. (1994). Series estimation of semilinear models. J. Multivariate Anal. 50 30–40.
  • Engle, R., Granger, C., Rice, J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310–320.
  • Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
  • Gronau, R. (1973). The effects of children on the housewife’s value of time. J. Polit. Econ. 81 S168–S199.
  • Härdle, W., Liang, H. and Gao, J. T. (2000). Partially Linear Models. Springer, Heidelberg.
  • Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econ. Soc. Meas. 5 475–492.
  • Kakade, S., Kalai, A. T., Kanade, V. and Shamir, O. (2011). Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems 23 927–935.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
  • Ledoux, M. (1995/97). On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Stat. 1 63–87.
  • Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104 234–248.
  • Liang, H., Liu, X., Li, R. and Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models. Ann. Statist. 38 3811–3836.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • Mendelson, S. (2002). Geometric parameters of kernel machines. In Proceedings of COLT 29–43.
  • Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308–2335.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Newey, W. K. (2009). Two-step series estimation of sample selection models. Econom. J. 12 S217–S229.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
  • Robinson, P. M. (1988). Root-$N$-consistent semiparametric regression. Econometrica 56 931–954.
  • Rosenbaum, M. and Tsybakov, A. B. (2013). Improved matrix uncertainty selector. In From Probability to Statistics and Back: High-Dimensional Models and Processes—A Festschrift in Honor of Jon (A. Wellner, M. Banerjee et al., eds.). IMS Collections 9 276–290. IMS, Beachwood, OH.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • Wainwright, J. M. (2015). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Univ. California, Berkeley. In preparation.
  • Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge.
  • Yatchew, A. (2003). Semiparametric Regression for the Applied Econometrician. Cambridge Univ. Press, Cambridge.
  • Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.
  • Yu, Y. and Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. J. Amer. Statist. Assoc. 97 1042–1054.
  • Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zhu, Y. (2017). Supplement to “Nonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficients.” DOI:10.1214/16-AOS1528SUPP.
  • Zhu, L.. Dong, Y. and Li, R. (2013). Semiparametric estimation of conditional heteroscedasticity via single-index modeling. Statist. Sinica 23 1235–1255.

Supplemental materials

  • Supplementary materials for “Nonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficients”. This supplement contains two Appendices. Appendix A provides the proofs for the main results and Appendix S provides the remaining technical lemmas and proofs.