The Annals of Statistics

Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data

Heng Lian, Kaifeng Zhao, and Shaogao Lv

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In this paper, we consider the local asymptotics of the nonparametric function in a partially linear model, within the framework of the divide-and-conquer estimation. Unlike the fixed-dimensional setting in which the parametric part does not affect the nonparametric part, the high-dimensional setting makes the issue more complicated. In particular, when a sparsity-inducing penalty such as lasso is used to make the estimation of the linear part feasible, the bias introduced will propagate to the nonparametric part. We propose a novel approach for estimation of the nonparametric function and establish the local asymptotics of the estimator. The result is useful for massive data with possibly different linear coefficients in each subpopulation but common nonparametric function. Some numerical illustrations are also presented.

Article information

Ann. Statist., Volume 47, Number 5 (2019), 2922-2949.

Received: April 2018
Revised: July 2018
First available in Project Euclid: 3 August 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62G20: Asymptotic properties

Asymptotic normality B-splines local asymptotics profiled estimation


Lian, Heng; Zhao, Kaifeng; Lv, Shaogao. Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data. Ann. Statist. 47 (2019), no. 5, 2922--2949. doi:10.1214/18-AOS1769.

Export citation


  • Banerjee, M., Durot, C. and Sen, B. (2017). Divide and conquer in non-standard problems and the super-efficiency phenomenon. Ann. Statist. To appear.
  • Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Chen, X. and Xie, M. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statist. Sinica 24 1655–1684.
  • de Boor, C. (2001). A Practical Guide to Splines, Revised ed. Applied Mathematical Sciences 27. Springer, New York.
  • Demko, S. (1977). Inverses of band matrices and local convergence of spline projections. SIAM J. Numer. Anal. 14 616–619.
  • Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer Series in Statistics. Springer, New York.
  • Haerdle, W., Liang, H. and Gao, J. (2007). Partially linear models. Springer, New York.
  • Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 1600–1635.
  • Huang, J. Z., Zhang, L. and Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scand. J. Stat. 34 451–477.
  • Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M. I. (2014). A scalable bootstrap for massive data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 795–816.
  • Li, Q. (2000). Efficient estimation of additive partially linear models. Internat. Econom. Rev. 41 1073–1092.
  • Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104 234–248.
  • Lv, S. and Lian, H. (2017). A debiased distributed estimation for sparse partially linear models in diverging dimensions. Available at arXiv:1708.05487.
  • Schumaker, L. L. (2007). Spline Functions: Basic Theory, 3rd ed. Cambridge Mathematical Library. Cambridge Univ. Press, Cambridge.
  • Shi, C., Lu, W. and Song, R. (2017). A massive data framework for M-estimators with cubic-rate. J. Amer. Statist. Assoc. To appear.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Wainwright, M. (2018). High-Dimensional Statistics: A Non-asymptotic Viewpoint. Draft book.
  • Wang, L., Liu, X., Liang, H. and Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Ann. Statist. 39 1827–1851.
  • Xie, H. and Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. Ann. Statist. 37 673–696.
  • Zhang, Y., Duchi, J. and Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16 3299–3340.
  • Zhao, T., Cheng, G. and Liu, H. (2016). A partially linear framework for massive heterogeneous data. Ann. Statist. 44 1400–1437.
  • Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. Ann. Statist. 26 1760–1782.