## The Annals of Statistics

### Approximation and learning by greedy algorithms

#### Abstract

We consider the problem of approximating a given element f from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118–2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

#### Article information

Source
Ann. Statist., Volume 36, Number 1 (2008), 64-94.

Dates
First available in Project Euclid: 1 February 2008

https://projecteuclid.org/euclid.aos/1201877294

Digital Object Identifier
doi:10.1214/009053607000000631

Mathematical Reviews number (MathSciNet)
MR2387964

Zentralblatt MATH identifier
1138.62019

#### Citation

Barron, Andrew R.; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald A. Approximation and learning by greedy algorithms. Ann. Statist. 36 (2008), no. 1, 64--94. doi:10.1214/009053607000000631. https://projecteuclid.org/euclid.aos/1201877294

#### References

• Avellaneda, M., Davis, G. and Mallat, S. (1997). Adaptive greedy approximations. Constr. Approx. 13 57–98.
• Barron, A. R. (1990). Complexity regularization with application to artificial neural network. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561–576. Kluwer Academic Publishers, Dordrecht.
• Barron, A. R. (1992). Neural net approximation. Proc. 7th Yale Workshop on Adaptive and Learning Systems (K. S. Narendra, ed.) 1 69–72. New Haven, CT.
• Barron, A. R. (1993). Universal approximation bounds for superposition of n sigmoidal functions. IEEE Trans. Inform. Theory 39 930–945.
• Barron, A. and Cheang, G. H. L. (2001). Penalized least squares, model selection, convex hull classes, and neural nets. In Proceedings of the 9th ESANN, Brugge, Belgium (M. Verleysen, ed.) 371–376. De-Facto Press.
• Bennett, C. and Sharpley, R. (1988). Interpolation of Operators. Academic Press, Boston.
• Bergh, J. and Löfström, J. (1976). Interpolation Spaces. Springer, Berlin.
• DeVore, R. (1998). Nonlinear approximation. In Acta Numerica 7 51–150. Cambridge Univ. Press.
• DeVore, R. and Temlyakov, V. (1996). Some remarks on greedy algorithms. Adv. Comput. Math. 5 173–187.
• Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
• Györfy, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
• Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.
• Huang, C., Cheng, G. L. H. and Barron, A. R. Risk of penalized least squares, greedy term selection, and L1-penalized estimators from flexible function libraries. Yale Department of Statistics Report.
• Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert spaces and convergence rates for projection pursuit regression and neural network training. Ann. Statist. 20 608–613.
• Konyagin, S. V. and Temlyakov, V. N. (1999). Rate of convergence of pure greedy algorithm. East J. Approx. 5 493–499.
• Kurkova, V. and Sanguineti, M. (2001). Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inform. Theory 47 2659–2665.
• Kurkova, V. and Sanguineti, M. (2002). Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inform. Theory 48 264–275.
• Lee, W. S., Bartlett, P. and Williamson, R. C. (1996). Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans. Inform. Theory 42 2118–2132.
• Livshitz, E. D. and Temlyakov, V. N. (2003). Two lower estimates in greedy approximation. Constr. Approx. 19 509–524.
• Petrushev, P. P. (1998). Approximation by ridge functions and neural networks. SIAM J. Math. Anal. 30 155–189.
• Temlyakov, V. (2003). Nonlinear methods of approximation. J. FOCM 3 33–107.
• Temlyakov, V. (2005). Greedy algorithms with restricted depth search. Proc. of the Steklov Inst. Math. 248 255–267.
• Tibshirani, R. (1995). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58 267–288.