## The Annals of Statistics

### Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space

#### Abstract

This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle rates without the light tail requirement. In particular, the proposed estimator achieves the minimax lower bounds established for sparse additive mean regression. As a by-product, we also establish the concentration inequality for estimating the population mean when the general Lipschitz loss is involved. The practical effectiveness of the new method is demonstrated by competitive numerical results.

#### Article information

Source
Ann. Statist., Volume 46, Number 2 (2018), 781-813.

Dates
Revised: January 2017
First available in Project Euclid: 3 April 2018

https://projecteuclid.org/euclid.aos/1522742436

Digital Object Identifier
doi:10.1214/17-AOS1567

Mathematical Reviews number (MathSciNet)
MR3782384

Zentralblatt MATH identifier
06870279

Subjects
Primary: 62G20: Asymptotic properties
Secondary: 62G05: Estimation

#### Citation

Lv, Shaogao; Lin, Huazhen; Lian, Heng; Huang, Jian. Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space. Ann. Statist. 46 (2018), no. 2, 781--813. doi:10.1214/17-AOS1567. https://projecteuclid.org/euclid.aos/1522742436

#### References

• [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
• [2] Bach, F., Jenatton, R., Mairal, J. and Obozinski, G. (2012). Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. MIT Press, Cambridge, MA.
• [3] Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.
• [4] Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482.
• [5] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• [6] Belloni, A. and Chernozhukov, V. (2011). $\ell_{1}$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
• [7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [8] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500.
• [9] Breheny, P. and Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25 173–187.
• [10] Buchinsky, M. (1994). Changes in the U.S. wage structure 1963–1987: Application of quantile regression. Econometrica 62 405–458.
• [11] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [12] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. J. Amer. Statist. Assoc. 106 608–625.
• [13] Chatterjee, A. and Lahiri, S. N. (2013). Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Statist. 41 1232–1259.
• [14] Christmann, A. and Zhou, D.-X. (2016). Learning rates for the risk of kernel-based quantile regression estimators in additive models. Anal. Appl. (Singap.) 14 449–477.
• [15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [16] He, X. (2009). Modeling and inference by quantile regression. Technical report, Dept. Statistics, Univ. Illinois at Urbana–Champaign.
• [17] He, X., Wang, L. and Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Statist. 41 342–369.
• [18] Horowitz, J. L. and Lee, S. (2005). Nonparametric estimation of an additive quantile regression model. J. Amer. Statist. Assoc. 100 1238–1249.
• [19] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
• [20] Hunter, D. R. and Lange, K. (2000). Quantile regression via an MM algorithm. J. Comput. Graph. Statist. 9 60–77.
• [21] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
• [22] Kato, K. (2016). Group Lasso for high dimensional sparse quantile regression models. arXiv:1103.1458.
• [23] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
• [24] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33–50.
• [25] Koenker, R., Roger, W. and D’Orey, V. (1987). Algorithm AS 229: Computing regression quantiles. J. R. Stat. Soc. Ser. C. Appl. Stat. 36 383–384.
• [26] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660–3695.
• [27] Li, Y., Liu, Y. and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert spaces. J. Amer. Statist. Assoc. 102 255–268.
• [28] Lian, H. (2012). Semiparametric estimation of additive quantile regression models by two-fold penalty. J. Bus. Econom. Statist. 30 337–350.
• [29] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
• [30] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 3498–3528.
• [31] Lv, S., He, X. and Wang, J. (2017). A unified penalized method for sparse additive quantile models: An RKHS approach. Ann. Inst. Statist. Math. 69 897–923.
• [32] Lv, S., Lin, H., Lian, H. and Huang, J. (2018). Supplement to “Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space.” DOI:10.1214/17-AOS1567SUPP.
• [33] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
• [34] Mendelson, S. (2002). Geometric parameters of kernel machines. In Computational Learning Theory (Sydney, 2002). Lecture Notes in Computer Science 2375 29–43. Springer, Berlin.
• [35] Pearce, N. D. and Wand, M. P. (2006). Penalized splines and reproducing kernel methods. Amer. Statist. 60 233–240.
• [36] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389–427.
• [37] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 1009–1030.
• [38] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [39] Scholköpf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machine, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.
• [40] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer, New York.
• [41] Steinwart, I. and Christmann, A. (2011). Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17 211–225.
• [42] Suzuki, T. and Sugiyama, M. (2013). Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. Ann. Statist. 41 1381–1405.
• [43] Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with $l_{1}$ complexity regularization. Bernoulli 12 1045–1076.
• [44] The Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.
• [45] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• [46] Tseng, P. and Yun, S. (2009). A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117 387–423.
• [47] van de Geer, S. (2002). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
• [48] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• [49] Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 107 214–222.
• [50] Wei, F., Huang, J. and Li, H. (2011). Variable selection and estimation in high-dimensional varying-coefficient models. Statist. Sinica 21 1515–1540.
• [51] Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statist. Sinica 19 801–817.
• [52] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [53] Zhang, X., Wu, Y., Wang, L. and Li, R. (2016). Variable selection for support vector machines in moderately high dimensions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 53–76.
• [54] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

#### Supplemental materials

• Supplement to “Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space”. To highlight the nature and usefulness of Assumptions 3–4, we state some simple sufficient conditions to verify them respectively in the Supplementary Material. Besides, due to space limitation, we also give the proofs of Theorem 1 and Lemma 2 in the Supplementary Material.