Open Access
September 2012 Statistical methods for tissue array images—algorithmic scoring and co-training
Donghui Yan, Pei Wang, Michael Linden, Beatrice Knudsen, Timothy Randolph
Ann. Appl. Stat. 6(3): 1280-1305 (September 2012). DOI: 10.1214/12-AOAS543


Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm—Tissue Array Co-Occurrence Matrix Analysis (TACOMA)—for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists’ input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size $30$). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is “sufficient” redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists’ performance in terms of accuracy and repeatability.


Download Citation

Donghui Yan. Pei Wang. Michael Linden. Beatrice Knudsen. Timothy Randolph. "Statistical methods for tissue array images—algorithmic scoring and co-training." Ann. Appl. Stat. 6 (3) 1280 - 1305, September 2012.


Published: September 2012
First available in Project Euclid: 31 August 2012

zbMATH: 1254.92033
MathSciNet: MR3012530
Digital Object Identifier: 10.1214/12-AOAS543

Keywords: ‎classification‎ , co-training , high-dimensional inference , ratio of separation

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.6 • No. 3 • September 2012
Back to Top