Open Access
VOL. 3 | 2008 An ensemble approach to improved prediction from multitype data
Jennifer Clarke, David Seo

Editor(s) Bertrand Clarke, Subhashis Ghosal

Inst. Math. Stat. (IMS) Collect., 2008: 302-317 (2008) DOI: 10.1214/074921708000000219


We have developed a strategy for the analysis of newly available binary data to improve outcome predictions based on existing data (binary or non-binary). Our strategy involves two modeling approaches for the newly available data, one combining binary covariate selection via LASSO with logistic regression and one based on logic trees. The results of these models are then compared to the results of a model based on existing data with the objective of combining model results to achieve the most accurate predictions. The combination of model predictions is aided by the use of support vector machines to identify subspaces of the covariate space in which specific models lead to successful predictions. We demonstrate our approach in the analysis of single nucleotide polymorphism (SNP) data and traditional clinical risk factors for the prediction of coronary heart disease.


Published: 1 January 2008
First available in Project Euclid: 28 April 2008

MathSciNet: MR2459232

Digital Object Identifier: 10.1214/074921708000000219

Primary: 62H30 , 62M20
Secondary: 62P10

Keywords: model ensembles , prediction , single nucleotide polymorphism (SNP) , Support vector machines , Variable selection

Rights: Copyright © 2008, Institute of Mathematical Statistics

Back to Top