The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 9, Number 4 (2015), 2090-2109.
“Virus hunting” using radial distance weighted discrimination
Motivated by the challenge of using DNA-seq data to identify viruses in human blood samples, we propose a novel classification algorithm called “Radial Distance Weighted Discrimination” (or Radial DWD). This classifier is designed for binary classification, assuming one class is surrounded by the other class in very diverse radial directions, which is seen to be typical for our virus detection data. This separation of the 2 classes in multiple radial directions naturally motivates the development of Radial DWD. While classical machine learning methods such as the Support Vector Machine and linear Distance Weighted Discrimination can sometimes give reasonable answers for a given data set, their generalizability is severely compromised because of the linear separating boundary. Radial DWD addresses this challenge by using a more appropriate (in this particular case) spherical separating boundary. Simulations show that for appropriate radial contexts, this gives much better generalizability than linear methods, and also much better than conventional kernel based (nonlinear) Support Vector Machines, because the latter methods essentially use much of the information in the data for determining the shape of the separating boundary. The effectiveness of Radial DWD is demonstrated for real virus detection.
Ann. Appl. Stat., Volume 9, Number 4 (2015), 2090-2109.
Received: May 2014
Revised: August 2015
First available in Project Euclid: 28 January 2016
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Xiong, Jie; Dittmer, D. P.; Marron, J. S. “Virus hunting” using radial distance weighted discrimination. Ann. Appl. Stat. 9 (2015), no. 4, 2090--2109. doi:10.1214/15-AOAS869. https://projecteuclid.org/euclid.aoas/1453994193
- Supplement to: “Virus hunting” using Radial Distance Weighted Discrimination. In the supplementary materials, we first introduce some useful biology background for virus detection in Section 1, DNA alignment process in Section 2, and then discuss the insights of the Dirichlet distribution in Section 3. Real data examples and simulation studies are included in Sections 4 and 5, respectively. Theorems and proofs are in Section 6.