Abstract
The surroundings of a cancerous tumor impact how it grows and develops in humans. New data from early breast cancer patients contains information on the collagen fibers surrounding the tumorous tissue—offering hope of finding additional biomarkers for diagnosis and prognosis—but poses two challenges for typical analysis. Each image section contains information on hundreds of fibers, and each tissue has multiple image sections contributing to a single prediction of tumor vs. nontumor. This nested relationship of fibers within image spots within tissue samples requires a specialized analysis approach.
We devise a novel support vector machine (SVM)-based predictive algorithm for this data structure. By treating the collection of fibers as a probability distribution, we can measure similarities between the collections through a flexible kernel approach. By assuming the relationship of tumor status between image sections and tissue samples, the constructed SVM problem is nonconvex, and traditional algorithms can not be applied. We propose two algorithms that exchange computational accuracy and efficiency to manage data of all sizes. The predictive performance of both algorithms is evaluated on the collagen fiber data set and additional simulation scenarios. We offer reproducible implementations of both algorithms of this approach in the R package mildsvm.
Funding Statement
Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number T32LM012413 and the University of Wisconsin Carbone Cancer Center Support Grant P30 CA014520. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Citation
Sean Kent. Menggang Yu. "Nonconvex SVM for cancer diagnosis based on morphologic features of tumor microenvironment." Ann. Appl. Stat. 18 (3) 2187 - 2206, September 2024. https://doi.org/10.1214/24-AOAS1876
Information