Open Access
VOL. 8 | 2012 Ensemble classifiers
Dhammika Amaratunga, Javier Cabrera, Yauheniya Cherckas, Yung-Seop Lee

Editor(s) Dominique Fourdrinier, Éric Marchand, Andrew L. Rukhin

Inst. Math. Stat. (IMS) Collect., 2012: 235-246 (2012) DOI: 10.1214/11-IMSCOLL816

Abstract

Ensemble classification methods like Random Forest are powerful and versatile classifiers. We explore variations in the ensemble approach and demonstrate the strong performance of ensemble versions of Linear Discriminant Analysis (LDA) variants such as LDA-PCA (LDA after a Principal Components Analysis step to reduce dimensionality) and LASSO in situations characterized by a huge number of features and a small number of samples such as DNA microarray data. We also demonstrate the value of enriching the ensembles with features that are most likely to be informative in situations where only a very small percentage of the features actually carries classification information. Notably, in the case studies we analyzed, the enriched ensemble procedure with LDA-PCA as base classifier had a misclassification rate that was essentially half that observed with Random Forest.

Information

Published: 1 January 2012
First available in Project Euclid: 14 March 2012

zbMATH: 1320.62142
MathSciNet: MR3202514

Digital Object Identifier: 10.1214/11-IMSCOLL816

Subjects:
Primary: 62P10 , 68T05 , 68T10

Keywords: ‎classification‎ , ensemble , Lasso , linear discriminant analysis , microarray , Random forest

Rights: Copyright © 2012, Institute of Mathematical Statistics

Back to Top