Ensemble classification methods like Random Forest are powerful and versatile classifiers. We explore variations in the ensemble approach and demonstrate the strong performance of ensemble versions of Linear Discriminant Analysis (LDA) variants such as LDA-PCA (LDA after a Principal Components Analysis step to reduce dimensionality) and LASSO in situations characterized by a huge number of features and a small number of samples such as DNA microarray data. We also demonstrate the value of enriching the ensembles with features that are most likely to be informative in situations where only a very small percentage of the features actually carries classification information. Notably, in the case studies we analyzed, the enriched ensemble procedure with LDA-PCA as base classifier had a misclassification rate that was essentially half that observed with Random Forest.
Digital Object Identifier: 10.1214/11-IMSCOLL816