The Annals of Applied Probability

Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins

Vladimir Koltchinskii, Dmitriy Panchenko, and Fernando Lozano

Full-text: Open access


A problem of bounding the generalization error of a classifier %\break $f\in \conv(\mathcal{H})$, where $\mathcal{H}$ is a "base" class of functions (classifiers), is considered. This problem frequently occurs in computer learning, where efficient algorithms that combine simple classifiers into a complex one (such as boosting and bagging) have attracted a lot of attention. Using Talagrand's concentration inequalities for empirical processes, we obtain new sharper bounds on the generalization error of combined classifiers that take into account both the empirical distribution of "classification margins" and an "approximate dimension" of the classifiers, and study the performance of these bounds in several experiments with learning algorithms.

Article information

Ann. Appl. Probab., Volume 13, Number 1 (2003), 213-252.

First available in Project Euclid: 16 January 2003

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties 60F15: Strong theorems

Generalization error combined classifier margin approximate dimension empirical process Rademacher process random entropies concentration inequalities boosting bagging


Koltchinskii, Vladimir; Panchenko, Dmitriy; Lozano, Fernando. Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins. Ann. Appl. Probab. 13 (2003), no. 1, 213--252. doi:10.1214/aoap/1042765667.

Export citation


  • [1] ANTHONY, M. and BARTLETT, P. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press.
  • [2] BARTLETT, P. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44 525-536.
  • [3] BARTLETT, P., BOUCHERON, S. and LUGOSI, G. (2001). Model selection and error estimation. Machine Learning 48 85-113.
  • [4] BLAKE, C. L. and MERZ, C. J. (1998). UCI repository of machine learning databases. Available at mlearn/MLRepository.html.
  • [5] BREIMAN, L. (1996). Bagging predictors. Machine Learning 26 123-140.
  • [6] BREIMAN, L. (1998). Arcing classifiers. Ann. Statist. 26 801-849.
  • [7] CORTES, C. and VAPNIK, V. (1995). Support vector networks. Machine Learning 24 273-297.
  • [8] DEVROy E, L., GYÖRFI, L. and LUGOSI, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • [9] DUFFY, N. and HELMBOLD, D. (1999). A geometric approach to leveraging weak learners. Computational Learning Theory. Lecture Notes in Comput. Sci. 18-33. Springer, New York.
  • [10] FREUND, Y. and SCHAPIRE, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119-139.
  • [11] GROVE, A. and SCHUURMANS, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence 692-699. AAAI Press, Menlo Park, California.
  • [12] HUSH, D. and HORNE, B. (1998). Efficient algorithms for function approximation with piecewise linear sigmoids. IEEE Trans. Neural Networks 9 1129-1141.
  • [13] KEARNS, M., MANSOUR, Y., NG, A. and RON, D. (1997). An experimental and theoretical comparison of model selection methods. Machine Learning 27 7-50.
  • [14] KOLTCHINSKII, V. (2001). Bounds on margin distributions in learning problems. Preprint. Available at panchenk/.
  • [15] KOLTCHINSKII, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902-1914.
  • [16] KOLTCHINSKII, V. and PANCHENKO, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability II (D. Mason, E. Giné and J. Wellner, eds.) 443-457. Birkhäuser, Boston.
  • [17] KOLTCHINSKII, V. and PANCHENKO, D. (2002). Empirical margin distribution and bounding the generalization error of combined classifiers. Ann. Statist. 30 1-50.
  • [18] KOLTCHINSKII, V., PANCHENKO, D. and LOZANO, F. (2001). Bounding the generalization error of neural networks and combined classifiers. In Proceedings of Thirteenth International Conference on Advances in Neural Information Processing Sy stems 245-251. MIT Press.
  • [19] LOZANO, F. and KOLTCHINSKII, V. (2002). Direct optimization of simple cost functions of the margin. In Proceedings of the First International NAISO Congress on Neuro Fuzzy Technologies. Academic Press, Amsterdam.
  • [20] MASON, L., BARTLETT, P. and BAXTER, J. (2000). Improved generalization through explicit optimization of margins. Machine Learning 38 243-255.
  • [21] MASON, L., BAXTER, J., BARTLETT, P. and FREAN, M. (2000). Functional gradient techniques of combining hy potheses. In Advances in Large Margin Classifiers (A. J. Smol, P. Bartlett, B. Schölkopf and C. Schuurmans, eds.) 221-246. MIT Press.
  • [22] MASSART, P. (2000). About the constants in Talagrand's concentration inequalities for empirical processes. Ann. Probab. 28 863-884.
  • [23] PISIER, G. (1981). Remarques sur un résultat non publié de B. Maurey. In Séminaire d'Analy se Fonctionelle 1980-1981, Exposé 5. Ecole Poly technique, Palaiseau.
  • [24] SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1687.
  • [25] TALAGRAND, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505-563.
  • [26] VAN DER VAART, A. W. and WELLNER, J. (1996). Weak Convergence of Empirical Processes with Applications to Statistics. Springer, New York.
  • [27] VAPNIK, V. (1998). Statistical Learning Theory. Wiley, New York.