Open Access
2021 Double data piling leads to perfect classification
Woonyoung Chang, Jeongyoun Ahn, Sungkyu Jung
Author Affiliations +
Electron. J. Statist. 15(2): 6382-6428 (2021). DOI: 10.1214/21-EJS1945

Abstract

Data piling refers to the phenomenon that training data vectors from each class project to a single point for classification. While this interesting phenomenon has been a key to understanding many distinctive properties of high-dimensional discrimination, the theoretical underpinning of data piling is far from properly established. In this work, high-dimensional asymptotics of data piling is investigated under a spiked covariance model, which reveals its close connection to the well-known ridged linear classifier. In particular, by projecting the ridge discriminant vector onto the subspace spanned by the leading sample principal component directions and the maximal data piling vector, we show that a negatively ridged discriminant vector can asymptotically achieve data piling of independent test data, essentially yielding a perfect classification. The second data piling direction is obtained purely from training data and shown to have a maximal property. Furthermore, asymptotic perfect classification occurs only along the second data piling direction.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C2002256, 2021R1A2C1093526).

Acknowledgments

We would like to thank Editor, Associate Editor and anonymous reviewers whose comments and suggestions helped to improve and clarify our manuscript. We would also like to thank Mr. Taehyun Kim for constructive criticism of the manuscript.

Citation

Download Citation

Woonyoung Chang. Jeongyoun Ahn. Sungkyu Jung. "Double data piling leads to perfect classification." Electron. J. Statist. 15 (2) 6382 - 6428, 2021. https://doi.org/10.1214/21-EJS1945

Information

Received: 1 July 2021; Published: 2021
First available in Project Euclid: 27 December 2021

Digital Object Identifier: 10.1214/21-EJS1945

Subjects:
Primary: 62H25 , 62H30
Secondary: 62J07

Keywords: discrimination , high dimension low sample size , maximal data piling , negative ridge , spike covariance model

Vol.15 • No. 2 • 2021
Back to Top