## The Annals of Statistics

### Doubly penalized estimation in additive regression with high-dimensional data

#### Abstract

Additive regression provides an extension of linear regression by modeling the signal of a response as a sum of functions of covariates of relatively low complexity. We study penalized estimation in high-dimensional nonparametric additive regression where functional semi-norms are used to induce smoothness of component functions and the empirical $L_{2}$ norm is used to induce sparsity. The functional semi-norms can be of Sobolev or bounded variation types and are allowed to be different amongst individual component functions. We establish oracle inequalities for the predictive performance of such methods under three simple technical conditions: a sub-Gaussian condition on the noise, a compatibility condition on the design and the functional classes under consideration and an entropy condition on the functional classes. For random designs, the sample compatibility condition can be replaced by its population version under an additional condition to ensure suitable convergence of empirical norms. In homogeneous settings where the complexities of the component functions are of the same order, our results provide a spectrum of minimax convergence rates, from the so-called slow rate without requiring the compatibility condition to the fast rate under the hard sparsity or certain $L_{q}$ sparsity to allow many small components in the true regression function. These results significantly broaden and sharpen existing ones in the literature.

#### Article information

Source
Ann. Statist., Volume 47, Number 5 (2019), 2567-2600.

Dates
Revised: July 2018
First available in Project Euclid: 3 August 2019

https://projecteuclid.org/euclid.aos/1564797857

Digital Object Identifier
doi:10.1214/18-AOS1757

Mathematical Reviews number (MathSciNet)
MR3988766

Zentralblatt MATH identifier
07114922

#### Citation

Tan, Zhiqiang; Zhang, Cun-Hui. Doubly penalized estimation in additive regression with high-dimensional data. Ann. Statist. 47 (2019), no. 5, 2567--2600. doi:10.1214/18-AOS1757. https://projecteuclid.org/euclid.aos/1564797857

#### References

• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• Dalalyan, A., Ingster, Y. and Tsybakov, A. B. (2014). Statistical inference in compound functional models. Probab. Theory Related Fields 158 513–532.
• DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 303. Springer, Berlin.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• Gu, C. (2002). Smoothing Spline ANOVA Models. Springer Series in Statistics. Springer, New York.
• Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. CRC Press, London.
• Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
• Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). $l_{1}$ trend filtering. SIAM Rev. 51 339–360.
• Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660–3695.
• Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.
• Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
• Lorentz, G. G., Golitschek, M. V. and Makovoz, Y. (1996). Constructive Approximation: Advanced Problems. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 304. Springer, Berlin.
• Mammen, E. (1991). Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19 741–759.
• Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
• Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
• Müller, P. and van de Geer, S. (2015). The partial linear model in high dimensions. Scand. J. Stat. 42 580–608.
• Nirenberg, L. (1966). An extended interpolation inequality. Ann. Sc. Norm. Super. Pisa Cl. Sci. (3) 20 733–737.
• Petersen, A., Witten, D. and Simon, N. (2016). Fused lasso additive model. J. Comput. Graph. Statist. 25 1005–1025.
• Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389–427.
• Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 1009–1030.
• Sadhanala, V. and Tibshirani, R. J. (2017). Additive models with trend filtering. Preprint. Available at arXiv:1702.05037.
• Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
• Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
• Suzuki, T. and Sugiyama, M. (2013). Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. Ann. Statist. 41 1381–1405.
• Tan, Z. and Zhang, C.-H. (2019). Supplement to “Doubly penalized estimation in additive regression with high-dimensional data.” DOI:10.1214/18-AOS1757SUPP.
• Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285–323.
• van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
• van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York.
• Yang, T. and Tan, Z. (2018). Backfitting algorithms for total-variation and empirical-norm penalized additive modelling with high-dimensional data. Stat 7 e198.
• Yang, Y. and Tokdar, S. T. (2015). Minimax-optimal nonparametric regression in high dimensions. Ann. Statist. 43 652–674.
• Yuan, M. and Zhou, D.-X. (2016). Minimax optimal rates of estimation in high dimensional additive models. Ann. Statist. 44 2564–2593.

#### Supplemental materials

• Supplement to “Doubly penalized estimation in additive regression with high-dimensional data”. We provide proofs and technical tools.