The Annals of Statistics

Additive models with trend filtering

Veeranjaneyulu Sadhanala and Ryan J. Tibshirani

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $k\geq0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering.

Article information

Ann. Statist., Volume 47, Number 6 (2019), 3032-3068.

Received: April 2018
Revised: February 2019
First available in Project Euclid: 31 October 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62G08: Nonparametric regression 62G20: Asymptotic properties

Additive models trend filtering smoothing splines nonparametric regression minimax rates parallel backfitting


Sadhanala, Veeranjaneyulu; Tibshirani, Ryan J. Additive models with trend filtering. Ann. Statist. 47 (2019), no. 6, 3032--3068. doi:10.1214/19-AOS1833.

Export citation


  • Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternative direction method of multipliers. Found. Trends Mach. Learn. 3 1–122.
  • Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580–619.
  • Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–555.
  • de Boor, C. (1978). A Practical Guide to Splines. Applied Mathematical Sciences 27. Springer, New York.
  • Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879–921.
  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461–470.
  • Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed models based on Markov random field priors. J. Roy. Statist. Soc. Ser. C 50 201–220.
  • Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • Gu, C. and Wahba, G. (1991). Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12 383–398.
  • Haris, A., Simon, N. and Shojaie, A. (2018). Generalized sparse additive models. Available at
  • Hastie, T. (1983). Non-parametric logistic regression. Technical report, Stanford Univ., Stanford, CA.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. CRC Press, London.
  • Kim, Y.-J. and Gu, C. (2004). Smoothing spline Gaussian regression: More scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66 337–356.
  • Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). $l_{1}$ trend filtering. SIAM Rev. 51 339–360.
  • Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
  • Lou, Y., Bien, J., Caruana, R. and Gehrke, J. (2016). Sparse partially linear additive models. J. Comput. Graph. Statist. 25 1026–1040.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
  • Petersen, A., Witten, D. and Simon, N. (2016). Fused lasso additive model. J. Comput. Graph. Statist. 25 1005–1025.
  • Ramdas, A. and Tibshirani, R. J. (2016). Fast and flexible ADMM algorithms for trend filtering. J. Comput. Graph. Statist. 25 839–858.
  • Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 1009–1030.
  • Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268.
  • Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 319–392.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Sadhanala, V. and Tibshirani, R. J (2019). Supplement to “Additive Models with Trend Filtering.” DOI:10.1214/19-AOS1833SUPP.
  • Sardy, S. and Tseng, P. (2004). AMlet, RAMlet, and GAMlet: Automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Statist. 13 283–309.
  • Steidl, G., Didas, S. and Neumann, J. (2006). Splines in higher order TV regularization. Int. J. Comput. Vis. 70 214–255.
  • Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
  • Tan, Z. and Zhang, C.-H. (2017). Penalized estimation in additive regression with high-dimensional data. Available at arXiv:1704.07229.
  • Tibshirani, R. J. (1983). Non-parametric estimation of relative risk. Technical report, Stanford Univ., Stanford, CA.
  • Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285–323.
  • Tibshirani, R. J. (2015). Degrees of freedom and model search. Statist. Sinica 25 1265–1296.
  • Tibshirani, R. J. (2017). Dykstra’s algorithm, ADMM, and coordinate descent: Connections, insights, and extensions. Adv. Neural Inf. Process. Syst. 30.
  • Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
  • Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198–1232.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
  • Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475–494.
  • van de Geer, S. (2014). On the uniform convergence of empirical norms and inner products, with application to causal inference. Electron. J. Stat. 8 543–574.
  • van der Burg, E. and de Leeuw, J. (1983). Non-linear canonical correlation. Br. J. Math. Stat. Psychol. 36 54–80.
  • Wang, Y.-X., Smola, A. and Tibshirani, R. J. (2014). The falling factorial basis and its statistical applications. Int. Conf. Mach. Learn. 31.
  • Wood, S. N. (2000). Modelling and smoothing parameter estimation with multiple quadratic penalties. J. R. Stat. Soc. Ser. B. Stat. Methodol. 62 413–428.
  • Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99 673–686.
  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 3–36.
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R. CRC Press, Boca Raton, FL.
  • Wood, S. N., Goude, Y. and Shaw, S. (2015). Generalized additive models for large data sets. J. R. Stat. Soc. Ser. C. Appl. Stat. 64 139–155.
  • Wood, S. N., Pya, N. and Säfken, B. (2016). Smoothing parameter and model selection for general smooth models. J. Amer. Statist. Assoc. 111 1548–1563.
  • Zhang, S. and Wong, M.-Y. (2003). Wavelet threshold estimation for additive regression models. Ann. Statist. 31 152–173.

Supplemental materials

  • Supplement to “Additive models with trend filtering”. Proofs and additional simulations.