## The Annals of Statistics

### Partially linear additive quantile regression in ultra-high dimension

#### Abstract

We consider a flexible semiparametric quantile regression model for analyzing high dimensional heterogeneous data. This model has several appealing features: (1) By considering different conditional quantiles, we may obtain a more complete picture of the conditional distribution of a response variable given high dimensional covariates. (2) The sparsity level is allowed to be different at different quantile levels. (3) The partially linear additive structure accommodates nonlinearity and circumvents the curse of dimensionality. (4) It is naturally robust to heavy-tailed distributions. In this paper, we approximate the nonlinear components using B-spline basis functions. We first study estimation under this model when the nonzero components are known in advance and the number of covariates in the linear part diverges. We then investigate a nonconvex penalized estimator for simultaneous variable selection and estimation. We derive its oracle property for a general class of nonconvex penalty functions in the presence of ultra-high dimensional covariates under relaxed conditions. To tackle the challenges of nonsmooth loss function, nonconvex penalty function and the presence of nonlinear components, we combine a recently developed convex-differencing method with modern empirical process techniques. Monte Carlo simulations and an application to a microarray study demonstrate the effectiveness of the proposed method. We also discuss how the method for a single quantile of interest can be extended to simultaneous variable selection and estimation at multiple quantiles.

#### Article information

Source
Ann. Statist., Volume 44, Number 1 (2016), 288-317.

Dates
Received: September 2014
Revised: July 2015
First available in Project Euclid: 10 December 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1449755964

Digital Object Identifier
doi:10.1214/15-AOS1367

Mathematical Reviews number (MathSciNet)
MR3449769

Zentralblatt MATH identifier
1331.62264

Subjects
Primary: 62G35: Robustness
Secondary: 62G20: Asymptotic properties

#### Citation

Sherwood, Ben; Wang, Lan. Partially linear additive quantile regression in ultra-high dimension. Ann. Statist. 44 (2016), no. 1, 288--317. doi:10.1214/15-AOS1367. https://projecteuclid.org/euclid.aos/1449755964

#### References

• Bai, Z. D. and Wu, Y. (1994). Limiting behavior of $M$-estimators of regression coefficients in high-dimensional linear models. I. Scale-dependent case. J. Multivariate Anal. 51 211–239.
• Belloni, A. and Chernozhukov, V. (2011). $\ell_{1}$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
• Bunea, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regression. Ann. Statist. 32 898–927.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Gilliam, M., Rifas-Shiman, S., Berkey, C., Field, A. and Colditz, G. (2003). Maternal gestational diabetes, birth weight and adolescent obesity. Pediatrics 111 221–226.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. J. Multivariate Anal. 73 120–135.
• He, X. and Shi, P. (1996). Bivariate tensor-product $B$-splines in a partly linear model. J. Multivariate Anal. 58 162–181.
• He, X., Wang, L. and Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Statist. 41 342–369.
• He, X., Zhu, Z.-Y. and Fung, W.-K. (2002). Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89 579–590.
• Huang, J., Breheny, P. and Ma, S. (2012). A selective review of group selection in high-dimensional models. Statist. Sci. 27 481–499.
• Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
• Huang, J., Wei, F. and Ma, S. (2012). Semiparametric regression pursuit. Statist. Sinica 22 1403–1426.
• Ishida, M., Monk, D., Duncan, A. J., Abu-Amero, S., Chong, J., Ring, S. M., Pembrey, M. E., Hindmarsh, P. C., Whittaker, J. C., Stanier, P. and Moore, G. E. (2012). Maternal inheritance of a promoter variant in the imprinted PHLDA2 gene significantly increases birth weight. Am. J. Hum. Genet. 90 715–719.
• Kai, B., Li, R. and Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Statist. 39 305–332.
• Lam, C. and Fan, J. (2008). Profile-kernel likelihood inference with diverging number of parameters. Ann. Statist. 36 2232–2260.
• Lee, E. R., Noh, H. and Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. J. Amer. Statist. Assoc. 109 216–229.
• Li, G., Xue, L. and Lian, H. (2011). Semi-varying coefficient models with a diverging number of components. J. Multivariate Anal. 102 1166–1174.
• Lian, H., Liang, H. and Ruppert, D. (2015). Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Statist. Sinica 25 591–607.
• Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104 234–248.
• Liu, X., Wang, L. and Liang, H. (2011). Estimation and variable selection for semiparametric additive partial linear models. Statist. Sinica 21 1225–1248.
• Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints. J. Nonparametr. Stat. 23 415–437.
• Schumaker, L. L. (1981). Spline Functions: Basic Theory. Wiley, New York.
• Sherwood, B. and Wang, L. (2015). Supplement to “Partially linear additive quantile regression in ultra-high dimension.” DOI:10.1214/15-AOS1367SUPP.
• Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
• Tang, Y., Song, X., Wang, H. J. and Zhu, Z. (2013). Variable selection in high-dimensional quantile varying coefficient models. J. Multivariate Anal. 122 115–132.
• Tao, P. D. and An, L. T. H. (1997). Convex analysis approach to d.c. programming: Theory, algorithms and applications. Acta Math. Vietnam. 22 289–355.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• Turan, N., Ghalwash, M., Kataril, S., Coutifaris, C., Obradovic, Z. and Sapienza, C. (2012). DNA methylation differences at growth related genes correlate with birth weight: A molecular signature linked to developmental origins of adult disease? BMC Medical Genomics 5 10.
• Votavova, H., Dostalova Merkerova, M., Fejglova, K., Vasikova, A., Krejcik, Z., Pastorkova, A., Tabashidze, N., Topinka, J., Veleminsky, M., Jr., Sram, R. J. and Brdicka, R. (2011). Transcriptome alterations in maternal and fetal cells induced by tobacco smoke. Placenta 32 763–770.
• Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 107 214–222.
• Wang, H. and Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. J. Amer. Statist. Assoc. 104 747–757.
• Wang, H. J., Zhu, Z. and Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. Ann. Statist. 37 3841–3866.
• Wang, L., Liu, X., Liang, H. and Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Ann. Statist. 39 1827–1851.
• Wei, Y. and He, X. (2006). Conditional growth charts. Ann. Statist. 34 2069–2131. With discussions and a rejoinder by the authors.
• Welsh, A. H. (1989). On $M$-processes and $M$-estimation. Ann. Statist. 17 337–361.
• Xie, H. and Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. Ann. Statist. 37 673–696.
• Xue, L. and Yang, L. (2006). Additive coefficient modeling via polynomial spline. Statist. Sinica 16 1423–1446.
• Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• Zhang, H. H., Cheng, G. and Liu, Y. (2011). Linear or nonlinear? Automatic structure discovery for partially linear models. J. Amer. Statist. Assoc. 106 1099–1112.
• Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.
• Zou, H. and Yuan, M. (2008). Regularized simultaneous model selection in multiple quantiles regression. Comput. Statist. Data Anal. 52 5296–5304.

#### Supplemental materials

• Supplemental Material to “Partially linear additive quantile regression in ultra-high dimension”. We provide technical details for some of the proofs and additional simulation results.