The Annals of Statistics

Learning by mirror averaging

A. Juditsky, P. Rigollet, and A. B. Tsybakov

Source: Ann. Statist. Volume 36, Number 5 (2008), 2183-2206.

Abstract

Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation.

Primary Subjects: 62G08
Secondary Subjects: 62C20, 62G05, 62G20
Keywords: Learning; aggregation; oracle inequalities; mirror averaging; model selection; stochastic optimization

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1223908089
Digital Object Identifier: doi:10.1214/07-AOS546
Mathematical Reviews number (MathSciNet): MR2458184
Zentralblatt MATH identifier: 05368488

References

[1] Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85–113.
[2] Belomestny, D. and Spokoiny, V. (2007). Spatial aggregation of local likelihood estimates with applications to classification. Ann. Statist. 35 2287–2311.
Mathematical Reviews (MathSciNet): MR2363972
Digital Object Identifier: doi:10.1214/009053607000000271
Project Euclid: euclid.aos/1194461731
[3] Ben-Tal, A. and Nemirovski, A. S. (1999). The conjugate barrier mirror descent method for non-smooth convex optimization. MINERVA Optimization Center Report, Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP_MD.pdf.
[4] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: Some recent advances. ESAIM Probab. Statist. 9 323–375.
Mathematical Reviews (MathSciNet): MR2182250
Digital Object Identifier: doi:10.1051/ps:2005018
[5] Bunea, F. and Nobel, A. (2005). Sequential procedures for aggregating arbitrary estimators of a conditional mean. Manuscript. Available at http://www.stat.fsu.edu/~flori.
[6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
Mathematical Reviews (MathSciNet): MR2351101
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626
[7] Catoni, O. (1997). A mixture approach to universal model selection. Preprint LMENS-97-30, Ecole Normale Supérieure, Paris.
[8] Catoni, O. (1999). “Universal” aggregation rules with exact bias bounds. Preprint 510, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris 6 and Paris 7. Available at http://www.proba.jussieu.fr/mathdoc/preprints/index.html#1999.
[9] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Ecole d’Eté de Probabilités de Saint-Flour XXXI—2001. Lecture Notes in Math. 1851. Springer, New York.
Mathematical Reviews (MathSciNet): MR2163920
Zentralblatt MATH: 1076.93002
[10] Cesa-Bianchi, N. and Lugosi, G. (2006) Prediction, Learning, and Games. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2409394
Zentralblatt MATH: 05028972
[11] Haussler, D., Kivinen, J. and Warmuth, M. K. (1998). Sequential prediction if individual sequences under general loss functions. IEEE Trans. Inform. Theory 44 1906–1925.
Mathematical Reviews (MathSciNet): MR1664051
Digital Object Identifier: doi:10.1109/18.705569
[12] Juditsky, A., Nazin, A., Tsybakov, A. and Vayatis, N. (2005). Recursive aggregation of estimators via the mirror descent algorithm with averaging. Problems Inform. Transmission 41 368–384.
[13] Kivinen, J. and Warmuth, M. K. (1999). Averaging expert predictions. In Proc. Fourth. European Conf. on Computational Learning Theory (H. U. Simon and P. Fischer, eds.) 153–167. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1724987
Digital Object Identifier: doi:10.1007/3-540-49097-3_13
[14] Lee, W. S., Bartlett, P. L. and Williamson, R.C. (1996). Efficient agnostic learning with bounded fan-in. IEEE Trans. Inform. Theory 42 2118–2132.
Mathematical Reviews (MathSciNet): MR1447518
Digital Object Identifier: doi:10.1109/18.485715
[15] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
Mathematical Reviews (MathSciNet): MR2242356
Digital Object Identifier: doi:10.1109/TIT.2006.878172
[16] Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697.
Mathematical Reviews (MathSciNet): MR2089138
Digital Object Identifier: doi:10.1214/009053604000000463
Project Euclid: euclid.aos/1091626183
[17] Massart, P. (2007). Concentration Inequalities and Model Selection. Ecole d’Eté de Probabilités de Saint-Flour XXXIII—2003. Lecture Notes in Math. 1896. Springer, New York.
Mathematical Reviews (MathSciNet): MR2319879
[18] Nemirovski, A. (2000). Topics in non-parametric statistics. Ecole d’Eté de Probabilités de Saint-Flour XXVIII—1998. Lecture Notes in Math. 1738 85–277. Springer, New York.
Mathematical Reviews (MathSciNet): MR1775640
Zentralblatt MATH: 0998.62033
[19] Nemirovski, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization, Wiley, Chichester.
Mathematical Reviews (MathSciNet): MR702836
Zentralblatt MATH: 0501.90062
[20] Nesterov, Y. (2007). Primal–dual subgradient methods for convex problems. Mathematical Programming. Published online DOI: 10.1007/s10107-007-0149-x. Also available as CORE Discussion Paper n. 2005/67, Center for Operation Research and Econometrics, Louvain-la-Neuve, Belgium, 2005.
[21] Petrov, V. V. (1995) Limit Theorems of Probability Theory. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet): MR1353441
Zentralblatt MATH: 0826.60001
[22] Samarov, A. and Tsybakov, A. (2007). Aggregation of density estimators and dimension reduction. In Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum (V. Nair, ed.) 233–251. World Scientific, Singapore.
Mathematical Reviews (MathSciNet): MR2416139
[23] Singer, A. and Feder, M. (1999). Universal linear prediction by model order weighting. IEEE Trans. Signal Process. 47 2685–2699.
[24] Tsybakov, A. (2003). Optimal rates of aggregation. In Computational Learning Theory and Kernel Machines, COLT-2003 (B. Schölkopf and M. Warmuth, eds.) 303–313. Springer, Heidelberg.
[25] Tsybakov, A. (2004). Introduction à l’estimation non paramétrique. Springer, Berlin.
[26] Vovk, V. (1990). Aggregating strategies. In Proc. 3rd Annual Workshop on Computational Learning Theory 372–383. Morgan Kaufman, San Mateo.
[27] Vovk, V. (2001). Competitive on-line statistics. Internat. Statist. Rev. 69 213–248.
[28] Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
Mathematical Reviews (MathSciNet): MR1962506
Digital Object Identifier: doi:10.1214/aos/1046294464
Project Euclid: euclid.aos/1046294464
[29] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
Mathematical Reviews (MathSciNet): MR1762904
Digital Object Identifier: doi:10.1214/aos/1016120365
Project Euclid: euclid.aos/1016120365
[30] Zhang, T. (2006). From epsilon-entropy to KL-complexity: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
Mathematical Reviews (MathSciNet): MR2291497
Digital Object Identifier: doi:10.1214/009053606000000704
Project Euclid: euclid.aos/1169571794

2009 © Institute of Mathematical Statistics