The Annals of Applied Statistics

Statistical methods for tissue array images—algorithmic scoring and co-training

Donghui Yan, Pei Wang, Michael Linden, Beatrice Knudsen, and Timothy Randolph

Full-text: Open access


Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm—Tissue Array Co-Occurrence Matrix Analysis (TACOMA)—for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists’ input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size $30$). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is “sufficient” redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists’ performance in terms of accuracy and repeatability.

Article information

Ann. Appl. Stat., Volume 6, Number 3 (2012), 1280-1305.

First available in Project Euclid: 31 August 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Classification ratio of separation high-dimensional inference co-training


Yan, Donghui; Wang, Pei; Linden, Michael; Knudsen, Beatrice; Randolph, Timothy. Statistical methods for tissue array images—algorithmic scoring and co-training. Ann. Appl. Stat. 6 (2012), no. 3, 1280--1305. doi:10.1214/12-AOAS543.

Export citation


  • Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • Bach, F. R. and Jordan, M. I. (2003). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
  • Bentzen, S. M., Buffa, F. M. and Wilson, G. D. (2008). Multiple biomarker tissue microarrays: Bioinformatics and practical approaches. Cancer Metastasis Rev. 27 481–494.
  • Berger, A. J., Davis, D. W., Tellez, C., Prieto, V. G., Gershenwald, J. E., Johnson, M. M., Rimm, D. L. and Bar-Eli, M. (2005). Automated quantitative analysis of activator protein-2alpha subcellular expression in melanoma tissue microarrays correlates with survival prediction. Cancer Res. 65 11185–11192.
  • Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 92–100. ACM, New York.
  • Breiman, L. (2001). Random Forests. Mach. Learn. 45 5–32.
  • Camp, R. L., Neumeister, V. and Rimm, D. L. (2008). A decade of tissue microarrays: Progress in the discovery and validation of cancer biomarkers. J. Clin. Oncol. 26 5630–5637.
  • Caruana, R., Karampatziakis, N. and Yessenalina, A. (2008). An empirical evaluation of supervised learning in high dimensions. In Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML), Helsinki, Finland 96–103. ACM, New York.
  • Chang, C. C. and Lin, C. J. (2001). LIBSVM: A library for support vector machines. Available at
  • Conners, R. and Harlow, C. (1980). A theoretical comparison of texture algorithms. IEEE Transactions on Pattern Analyses and Machine Intelligence 2 204–222.
  • Cortes, C. and Vapnik, V. (1995). Support vector networks. Machine Learning 20 273–297.
  • Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13 21–27.
  • Cregger, M., Berger, A. and Rimm, D. (2006). Immunohistochemistry and quantitative analysis of protein expression. Archives of Pathology & Laboratory Medicine 130 1026–1030.
  • Dasgupta, S. and Gupta, A. (2002). An elementary proof of the Johnson–Lindenstrauss lemma. Random Structures Algorithms 22 60–65.
  • DiVito, K. and Camp, R. (2005). Tissue microarrays—automated analysis and future directions. Breast Cancer Online 8.
  • Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimensionality. Aide-Memoire of a Lecture at AMS Conference on Math Challenges of the 21st Century. Available at
  • Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (ICML), Bari, Italy 148–156. Morgan Kaufman, San Francisco, CA.
  • Giltnane, J. M. and Rimm, D. L. (2004). Technology insight: Identification of biomarkers with tissue microarray technology. Nat. Clin. Pract. Oncol. 1 104–111.
  • Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B. and Smola, A. J. (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada 585–592. MIT Press, Cambridge, MA.
  • Hall, B. H., Ianosi-Irimie, M., Javidian, P., Chen, W., Ganesan, S. and Foran, D. J. (2008). Computer-assisted assessment of the human epidermal growth factor receptor 2 immunohistochemical assay in imaged histologic sections using a membrane isolation algorithm and quantitative analysis of positive controls. BMC Med. Imaging 8 11.
  • Haralick, R. (1979). Statistical and structural approaches to texture. Proceedings of IEEE 67 786–803.
  • Hassan, S., Ferrario, C., Mamo, A. and Basik, M. (2008). Tissue microarrays: Emerging standard for biomarker validation. Curr. Opin. Biotechnol. 19 19–25.
  • Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 832–844.
  • Holmes, S., Kapelner, A. and Lee, P. (2009). An interactive JAVA statistical image segmentation system: GemIdent. Journal of Statistical Software 30 1–20.
  • Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. In Conference in Modern Analysis and Probability (New Haven, Conn., 1982). Contemp. Math. 26 189–206. Amer. Math. Soc., Providence, RI.
  • Joshi, A. S., Sharangpani, G. M., Porter, K., Keyhani, S., Morrison, C., Basu, A. S., Gholap, G. A., Gholap, A. S. and Barsky, S. H. (2007). Semi-automated imaging system to quantitate Her-2/neu membrane receptor immunoreactivity in human breast cancer. Cytometry Part A 71 273–285.
  • Kirkegaard, T., Edwards, J., Tovey, S., McGlynn, L. M., Krishna, S. N., Mukherjee, R., Tam, L., Munro, A. F., Dunne, B. and Bartlett, J. M. S. (2006). Observer variation in immunohistochemical analysis of protein expression, time for a change? Histopathology 48 787–794.
  • Kononen, J., Bubendorf, L., Kallionimeni, A., Bärlund, M., Schraml, P., Leighton, S., Torhorst, J., Mihatsch, M., Sauter, G. and Kallionimeni, O. (1998). Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nature Medicine 4 844–847.
  • Marinelli, R., Montgomery, K., Liu, C., Shah, N., Prapong, W., Nitzberg, M., Zachariah, Z., Sherlock, G., Natkunam, Y., West, R. et al. (2007). The Stanford tissue microarray database. Nucleic Acids Res. 36 D871–D877.
  • Masmoudi, H., Hewitt, S. M., Petrick, N., Myers, K. J. and Gavrielides, M. A. (2009). Automated quantitative assessment of HER-2/neu immunohistochemical expression in breast cancer. IEEE Trans. Med. Imaging 28 916–925.
  • Mulrane, L., Rexhepaj, E., Penney, S., Callanan, J. J. and Gallagher, W. M. (2008). Automated image analysis in histopathology: A valuable tool in medical diagnostics. Expert Rev. Mol. Diagn. 8 707–725.
  • Nigam, K. and Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management, McLean, VA 86–93. ACM, New York.
  • Pearl, J. (1985). Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the Seventh Conference of the Cognitive Science Society, Irvine, CA 329–334. Cognitive Science Soc., Austin, TX.
  • Penna, A., Grilli, R., Filardo, G., Mainini, F., Zola, P., Mantovani, L. and Liberati, A. (1997). Do different physicians’ panels reach similar conclusions? A case study on practice guidelines for limited surgery in breast cancer. European Journal of Public Health 7 436–440.
  • Rojo, M. G., Bueno, G. and Slodkowska, J. (2009). Review of imaging solutions for integrated quantitative immunohistochemistry in the pathology daily practice. Folia Histochem. Cytobiol. 47 349–354.
  • Skaland, I., Øvestad, I., Janssen, E. A. M., Klos, J., Kjellevold, K. H., Helliesen, T. and Baak, J. (2008). Digital image analysis improves the quality of subjective HER-2 expression scoring in breast cancer. Applied Immunohistochemistry & Molecular Morphology 16 185–190.
  • Tawfik, O. W., Kimler, B. F., Davis, M., Donahue, J. K., Persons, D. L., Fan, F., Hagemeister, S., Thomas, P., Connor, C., Jewell, W. et al. (2005). Comparison of immunohistochemistry by automated cellular imaging system (ACIS) versus fluorescence in-situ hybridization in the evaluation of HER-2/neu expression in primary breast carcinoma. Histopathology 48 258–267.
  • Thomson, T. A., Hayes, M. M., Spinelli, J. J., Hilland, E., Sawrenko, C., Phillips, D., Dupuis, B. and Parker, R. L. (2001). HER-2/neu in breast cancer: Interobserver variability and performance of immunohistochemistry with 4 antibodies compared with fluorescent in situ hybridization. Mod. Pathol. 14 1079–1086.
  • Voduc, D., Kenney, C. and Nielsen, T. O. (2008). Tissue microarrays in clinical oncology. Semin. Radiat. Oncol. 18 89–97.
  • Vrolijk, H., Sloos, W., Mesker, W., Franken, P., Fodde, R., Morreau, H. and Tanke, H. (2003). Automated acquisition of stained tissue microarrays for high-throughput evaluation of molecular targets. J. Mol. Diagn. 5 160–167.
  • Walker, R. A. (2006). Quantification of immunohistochemistry—issues concerning methods, utility and semiquantitative assessment I. Histopathology 49 406–410.
  • Wan, W. H., Fortuna, M. B. and Furmanski, P. (1987). A rapid and efficient method for testing immunohistochemical reactivity of monoclonal antibodies against multiple tissue samples simultaneously. J. Immunol. Methods 103 121–129.
  • Yan, D., Bickel, P. J. and Gong, P. (2006). A discrete log density expansion based approach to Ikonos image classification. In American Society for Photogrammetry and Remote Sensing Fall Speciality Conference, San Antonio, TX. Am. Soc. Photogrammetry and Remote Sensing, Bethesda, MD.
  • Yan, D., Gong, P., Chen, A. and Zhong, L. (2011). Classification under data contamination with application to image mis-registration in remote sensing. Available at arXiv:1101.3594.
  • Yan, D., Wang, P., Linden, M., Knudsen, B. and Randolph, T. (2012). Supplement to “Statistical methods for tissue array images—algorithmic scoring and co-training.” DOI:10.1214/12-AOAS543SUPPA, DOI:10.1214/12-AOAS543SUPPB.
  • Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA 189–196. Morgan Kaufmann, San Francisco, CA.

Supplemental materials

  • Supplementary material A: Supplement A: Assumption A_1, proof of Theorem 2 and simulations on thinning. We provide a detailed description of Assumption A_1, a sketch of the proof of Theorem 2 and simulations on the ratio of separation upon thinning under different settings.
  • Supplementary material B: Supplement B: TMA images with salient pixels marked. This supplement contains a close view of some TMA images where the salient pixels are highlighted.