Bayesian Analysis

Improving classification when a class hierarchy is available using a hierarchy-based prior

Radford M. Neal and Babak Shahbaba

Full-text: Open access


We introduce a new method for building classification models when we have prior knowledge of how the classes can be arranged in a hierarchy, based on how easily they can be distinguished. The new method uses a Bayesian form of the multinomial logit (MNL, a.k.a. "softmax") model, with a prior that introduces correlations between the parameters for classes that are nearby in the tree. We compare the performance on simulated data of the new method, the ordinary MNL model, and a model that uses the hierarchy in a different way. We also test the new method on page layout analysis and document classification problems, and find that it performs better than the other methods.

Article information

Bayesian Anal., Volume 2, Number 1 (2007), 221-237.

First available in Project Euclid: 22 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: Database Expansion Item

Hierarchical Classification Bayesian Models Multinomial Logistic Regression Page Layout Analysis Document Classification


Shahbaba, Babak; Neal, Radford M. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Anal. 2 (2007), no. 1, 221--237. doi:10.1214/07-BA209.

Export citation


  • Agresti, A. (2002) Categorical Data Analysis. John Willey and Son, Hoboken, New Jersy.
  • Cai, L. and Hoffmann, T. (2004) Hierarchical document categorization with support vector machines. ACM 13th Conference on Information and Knowledge Management.
  • Cesa-Bianchi, N., Gentile, C. and Zaniboni, L. (2006) Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
  • Dekel, O., Keshet, J. and Singer, Y. (2004) Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning (ICML).
  • Dumais, S. T. and Chen, H. (2000) Hierachical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 256–263.
  • Fox, J. (1997) Applied Regression Analysis, Linear Models and Related Methods. Sage.
  • Goodman, J. (2001) Classes for fast maximum entropy training. Proceedings of the IEEE International Conference on Acoustics, Speach and Signal Processing (ICASSP), IEEE press.
  • Koller, D. and Sahami, M. (1997) Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML).
  • Laven, K., Leishman, S. and Roweis, S. (2005) A statistical learning approach to document image analysis. Conference on Document Analysis and Recognition (ICDAR), Seoul, South Korea.
  • McCallum, A., Rosenfeld, R., Mitchell, T. and A., N. (1998) Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the International Conference on Machine Learning (ICML), pp. 359–360.
  • McFadden, D. (1980) Econometric models for probabilistic choice among products. Journal of Business, 53, 13–36.
  • Mitchell, T. M. (1998) Conditions for the equivalence of hierarchical and flat Bayesian classifiers.$\sim$tom/
  • Neal, R. M. (1993) Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.
  • Neal, R. M. (1996) Bayesian Learning for Neural Networks. Springer Verlag, New York.
  • Neal, R. M. (2003) Slice sampling. Annals of Statistics, 31, 705–767.
  • Riley, M. (1993) Functions of the gene products of Escherichia coli. Microbiology Review, 57, 862–952.
  • Sattath, S. and Tversky, A. (1977) Additive similarity trees. Psychometrika, 42, 319–345.
  • Shahbaba, B. and Neal, R. M. (2006) Gene function classification using Bayesian models with hierarchy-based priors. BMC Bioinformatics, 7:448.
  • Tsochantaridis, I., Hoffmann, T., Joachims, T. and Altum, Y. (2004) Support vector machine learning for independent and structured output spaces. Proceedings of the 21st International Conference on Machine Learning (ICML).
  • van Rijsbergen, C. J. (1972) Automatic Information Structuring and Retrieval. Ph.D. thesis, King's College, Cambridge.
  • Weigend, A. S., Wiener, E. D. and Pedersen, J. O. (1999) Exploiting hierarchy in text categorization. Information Retrieval, 1, 193–216.