Bayesian Analysis

Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior

Qingpo Cai, Jian Kang, and Tianwei Yu

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).

Article information

Source
Bayesian Anal., Advance publication (2018), 24 pages.

Dates
First available in Project Euclid: 5 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1546657330

Digital Object Identifier
doi:10.1214/18-BA1142

Keywords
gene network generalized linear model network marker selection posterior consistency thresholded graph Laplacian Gaussian prior

Rights
Creative Commons Attribution 4.0 International License.

Citation

Cai, Qingpo; Kang, Jian; Yu, Tianwei. Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior. Bayesian Anal., advance publication, 5 January 2019. doi:10.1214/18-BA1142. https://projecteuclid.org/euclid.ba/1546657330


Export citation

References

  • Aebersold, R. and Mann, M. (2003). “Mass spectrometry-based proteomics.” Nature, 422(6928): 198.
  • Barabási, A.-L. and Albert, R. (1999). “Emergence of scaling in random networks.” Science, 286(5439): 509–512.
  • Barabási, A.-L., Gulbahce, N., and Loscalzo, J. (2011). “Network medicine: a network-based approach to human disease.” Nature reviews genetics, 12(1): 56.
  • Barbieri, M. M., Berger, J. O., et al. (2004). “Optimal predictive model selection.” The annals of statistics, 32(3): 870–897.
  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet–Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479–1490.
  • Burger, R., Bakker, F., Guenther, A., Baum, W., Schmidt-Arras, D., Hideshima, T., Tai, Y.-T., Shringarpure, R., Catley, L., Senaldi, G., Gramatzki, M., and Anderson, K. C. (2003). “Functional significance of novel neurotrophin-1/B cell-stimulating factor-3 (cardiotrophin-like cytokine) for human myeloma cell growth and survival.” British Journal of Haematology, 123(5): 869–78.
  • Cai, Q., Kang, J., and Yu, T. (2018a). “Supplementary File 1 for “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior”.” Bayesian Analysis.
  • Cai, Q., Kang, J., and Yu, T. (2018b). “Supplementary file 2 for “Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior”.” Bayesian Analysis.
  • Caldon, C. E. (2014). “Estrogen signaling and the DNA damage response in hormone dependent breast cancers.” Frontiers in Oncology, 4: 106.
  • Chang, C., Kundu, S., and Long, Q. (2016). “Scalable Bayesian variable selection for structured high-dimensional data.” arXiv preprint arXiv:1604.07264.
  • Chekouo, T., Stingo, F. C., Guindani, M., Do, K.-A., et al. (2016). “A Bayesian predictive model for imaging genetics with application to schizophrenia.” The Annals of Applied Statistics, 10(3): 1547–1571.
  • Chung, F. R. (1997). Spectral graph theory, volume 92. American Mathematical Society.
  • Ciruelos Gil, E. M. (2014). “Targeting the PI3K/AKT/mTOR pathway in estrogen receptor-positive breast cancer.” Cancer Treatment Reviews, 40(7): 862–71.
  • Clauset, A., Newman, M. E., and Moore, C. (2004). “Finding community structure in very large networks.” Physical review E, 70(6): 066111.
  • Das, J. and Yu, H. (2012). “HINT: High-quality protein interactomes and their applications in understanding human disease.” BMC Systems Biology, 6: 92.
  • Dobra, A. (2009). “Variable selection and dependency networks for genomewide data.” Biostatistics, 10(4): 621–639.
  • Doi, K. (2007). “Computer-aided diagnosis in medical imaging: historical review, current status and future potential.” Computerized medical imaging and graphics, 31(4–5): 198–211.
  • Falcon, S. and Gentleman, R. (2007). “Using GOstats to test gene lists for GO term association.” Bioinformatics, 23(2): 257–8.
  • Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.” Journal of the American statistical Association, 96(456): 1348–1360.
  • Fang, Z. and Luna, E. J. (2013). “Supervillin-mediated suppression of p53 protein enhances cell survival.” Journal of Biological Chemistry, 288(11): 7918–29.
  • Formosa, R., Borg, J., and Vassallo, J. (2017). “Aryl hydrocarbon receptor (AHR) is a potential tumour suppressor in pituitary adenomas.” Endocrine Related Cancer, 24(8): 445–457.
  • George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88(423): 881–889.
  • Gilkes, D. M. and Semenza, G. L. (2013). “Role of hypoxia-inducible factors in breast cancer metastasis.” Future Oncology, 9(11): 1623–36.
  • Goldsmith, J., Huang, L., and Crainiceanu, C. M. (2014). “Smooth scalar-on-image regression via spatial Bayesian variable selection.” Journal of Computational and Graphical Statistics, 23(1): 46–64.
  • Greicius, M. D., Krasnow, B., Reiss, A. L., and Menon, V. (2003). “Functional connectivity in the resting brain: a network analysis of the default mode hypothesis.” Proceedings of the National Academy of Sciences, 100(1): 253–258.
  • Hopcroft, J. and Tarjan, R. (1973). “Algorithm 447: efficient algorithms for graph manipulation.” Communications of the ACM, 16(6): 372–378.
  • Jiang, W. (2007). “Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities.” The Annals of Statistics, 35(4): 1487–1511.
  • Jin, S.-S. and Song, W.-J. (2017). “Association between MDR1 C3435T polymorphism and colorectal cancer risk: A meta-analysis.” Medicine (Baltimore), 96(51): e9428.
  • Johnson, V. E. and Rossell, D. (2012). “Bayesian model selection in high-dimensional settings.” Journal of the American Statistical Association, 107(498): 649–660.
  • Kang, J., Reich, B. J., and Staicu, A.-M. (2018). “Scalar-on-image regression via the soft-thresholded Gaussian process.” Biometrika, 105(1): 165–184.
  • Kim, J., Gao, L., and Tan, K. (2012). “Multi-analyte network markers for tumor prognosis.” PLoS One, 7(12): e52973.
  • Kim, S., Pan, W., and Shen, X. (2013). “Network-based penalized regression with application to genomic data.” Biometrics, 69(3): 582–593.
  • Kitano, H. (2002). “Systems biology: a brief overview.” Science, 295(5560): 1662–1664.
  • Kovats, S. (2015). “Estrogen receptors regulate innate immune cells and signaling pathways.” Cellular Immunology, 294(2): 63–9.
  • Krausz, L. T., Fischer-Fodor, E., Major, Z. Z., and Fetica, B. (2012). “GITR-expressing regulatory T-cell subsets are increased in tumor-positive lymph nodes from advanced breast cancer patients as compared to tumor-negative lymph nodes.” International Journal of Immunopathology and Pharmacology, 25(1): 59–66.
  • Kundu, S., Shin, M., Cheng, Y., Manyam, G., Mallick, B. K., and Baladandayuthapani, V. (2015). “Bayesian Variable Selection with Structure Learning: Applications in Integrative Genomics.” arXiv preprint arXiv:1508.02803.
  • Le Rhun, E., Bertrand, N., Dumont, A., Tresch, E., Le Deley, M.-C., Mailliez, A., Preusser, M., Weller, M., Revillion, F., and Bonneterre, J. (2017). “Identification of single nucleotide polymorphisms of the PI3K-AKT-mTOR pathway as a risk factor of central nervous system metastasis in metastatic breast cancer.” European Journal of Cancer, 87: 189–198.
  • Leu, Y.-W., Yan, P. S., Fan, M., Jin, V. X., Liu, J. C., Curran, E. M., Welshons, W. V., Wei, S. H., Davuluri, R. V., Plass, C., Nephew, K. P., and Huang, T. H.-M. (2004). “Loss of estrogen receptor signaling triggers epigenetic silencing of downstream targets in breast cancer.” Cancer Research, 64(22): 8184–92.
  • Li, C. and Li, H. (2008). “Network-constrained regularization and variable selection for analysis of genomic data.” Bioinformatics, 24(9): 1175–1182.
  • Li, C. and Li, H. (2010). “Variable selection and regression analysis for graph-structured covariates with an application to genomics.” The annals of applied statistics, 4(3): 1498.
  • Li, F. and Zhang, N. R. (2010). “Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics.” Journal of the American Statistical Association, 105(491): 1202–1214.
  • Li, F., Zhang, T., Wang, Q., Gonzalez, M. Z., Maresh, E. L., Coan, J. A., et al. (2015). “Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression.” The Annals of Applied Statistics, 9(2): 687–713.
  • Li, Y.-X., Yu, Z.-W., Jiang, T., Shao, L.-W., Liu, Y., Li, N., Wu, Y.-F., Zheng, C., Wu, X.-Y., Zhang, M., Zheng, D.-F., Qi, X.-L., Ding, M., Zhang, J., and Chang, Q. (2018). “SNCA, a novel biomarker for Group 4 medulloblastomas, can inhibit tumor invasion and induce apoptosis.” Cancer Science, 109(4): 1263–1275.
  • Liu, F., Chakraborty, S., Li, F., Liu, Y., Lozano, A. C., et al. (2014). “Bayesian regularization via graph Laplacian.” Bayesian Analysis, 9(2): 449–474.
  • Liu, X., Chen, L., Ge, J., Yan, C., Huang, Z., Hu, J., Wen, C., Li, M., Huang, D., Qiu, Y., Hao, H., Yuan, R., Lei, J., Yu, X., and Shao, J. (2016). “The Ubiquitin-like Protein FAT10 Stabilizes eEF1A1 Expression to Promote Tumor Proliferation in a Complex Manner.” Cancer Research, 76(16): 4897–907.
  • Lopez, S. M., Agoulnik, A. I., Zhang, M., Peterson, L. E., Suarez, E., Gandarillas, G. A., Frolov, A., Li, R., Rajapakshe, K., Coarfa, C., Ittmann, M. M., Weigel, N. L., and Agoulnik, I. U. (2016). “Nuclear Receptor Corepressor 1 Expression and Output Declines with Prostate Cancer Progression.” Clinical Cancer Research, 22(15): 3937–49.
  • Luo, C., Pan, W., and Shen, X. (2012). “A two-step penalized regression method with networked predictors.” Statistics in biosciences, 4(1): 27–46.
  • Matthews, J. and Gustafsson, J.-A. (2006). “Estrogen receptor and aryl hydrocarbon receptor signaling pathways.” Nuclear Receptor Signaling, 4: e016.
  • Nakajima, J. and West, M. (2013a). “Bayesian analysis of latent threshold dynamic models.” Journal of Business & Economic Statistics, 31(2): 151–164.
  • Nakajima, J. and West, M. (2013b). “Bayesian dynamic factor models: Latent threshold approach.” Journal of Financial Econometrics, 11: 116–153.
  • Nakajima, J., West, M., et al. (2017). “Dynamics & sparsity in latent threshold factor models: A study in multivariate EEG signal processing.” Brazilian Journal of Probability and Statistics, 31(4): 701–731.
  • Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2017). “Bayesian graphical regression.” Journal of the American Statistical Association, (just-accepted).
  • Osborne, C. K., Shou, J., Massarweh, S., and Schiff, R. (2005). “Crosstalk between estrogen receptor and growth factor receptor pathways as a cause for endocrine therapy resistance in breast cancer.” Clinical Cancer Research, 11(2 Pt 2): 865s–70s.
  • Pan, W., Xie, B., and Shen, X. (2010). “Incorporating predictor network in penalized regression with application to microarray data.” Biometrics, 66(2): 474–484.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681–686.
  • Peng, B., Zhu, D., Ander, B. P., Zhang, X., Xue, F., Sharp, F. R., and Yang, X. (2013). “An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.” PloS one, 8(7): e67672.
  • Peng, S., Eidelberg, D., and Ma, Y. (2014). “Brain network markers of abnormal cerebral glucose metabolism and blood flow in Parkinson?s disease.” Neuroscience bulletin, 30(5): 823–837.
  • Peterson, C. B., Stingo, F. C., and Vannucci, M. (2016). “Joint Bayesian variable and graph selection for regression models with network-structured predictors.” Statistics in medicine, 35(7): 1017–1031.
  • Polson, N. G. and Scott, J. G. (2012). “Local shrinkage rules, Lévy processes and regularized regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2): 287–311.
  • Roberts, G. O., Gelman, A., Gilks, W. R., et al. (1997). “Weak convergence and optimal scaling of random walk Metropolis algorithms.” The annals of applied probability, 7(1): 110–120.
  • Roberts, G. O. and Rosenthal, J. S. (1998). “Optimal scaling of discrete approximations to Langevin diffusions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1): 255–268.
  • Roberts, G. O., Rosenthal, J. S., et al. (2001). “Optimal scaling for various Metropolis-Hastings algorithms.” Statistical science, 16(4): 351–367.
  • Schaer, D. A., Murphy, J. T., and Wolchok, J. D. (2012). “Modulation of GITR for cancer immunotherapy.” Current Opinion in Immunology, 24(2): 217–24.
  • Schuster, S. C. (2007). “Next-generation sequencing transforms today’s biology.” Nature methods, 5(1): 16.
  • Shi, R. and Kang, J. (2015). “Thresholded multiscale Gaussian processes with application to Bayesian feature selection for massive neuroimaging data.” arXiv preprint arXiv:1504.06074.
  • Song, Q. and Liang, F. (2015). “A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5): 947–972.
  • Stingo, F. C., Chen, Y. A., Tadesse, M. G., and Vannucci, M. (2011). “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.” The annals of applied statistics, 5(3).
  • Stubelius, A., Erlandsson, M. C., Islander, U., and Carlsten, H. (2014). “Immunomodulation by the estrogen metabolite 2-methoxyestradiol.” Clinical Immunology, 153(1): 40–8.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  • Wolff, M., Kosyna, F. K., Dunst, J., Jelkmann, W., and Depping, R. (2017). “Impact of hypoxia inducible factors on estrogen receptor expression in breast cancer cells.” Archives of Biochemistry and Biophysics, 613: 23–30.
  • Wu, S., Mao, L., Li, Y., Yin, Y., Yuan, W., Chen, Y., Ren, W., Lu, X., Li, Y., Chen, L., Chen, B., Xu, W., Tian, T., Lu, Y., Jiang, L., Zhuang, X., Chu, M., and Wu, J. (2018). “RAGE may act as a tumour suppressor to regulate lung cancer development.” Gene, 651: 86–93.
  • Yin, J., Zhang, Z., Zheng, H., and Xu, L. (2017). “IRS-2 rs1805097 polymorphism is associated with the decreased risk of colorectal cancer.” Oncotarget, 8(15): 25107–25114.
  • Yuan, X., Chen, J., Lin, Y., Li, Y., Xu, L., Chen, L., Hua, H., and Shen, B. (2017). “Network biomarkers constructed from gene expression and protein-protein interaction data for accurate prediction of Leukemia.” Journal of Cancer, 8(2): 278.
  • Zhang, C.-H. (2010). “Nearly unbiased variable selection under minimax concave penalty.” The Annals of statistics, 894–942.
  • Zhang, Y., Jiang, C., Li, H., Lv, F., Li, X., Qian, X., Fu, L., Xu, B., and Guo, X. (2015). “Elevated Aurora B expression contributes to chemoresistance and poor prognosis in breast cancer.” International Journal of Clinical and Experimental Pathology, 8(1): 751–7.
  • Zhe, S., Naqvi, S. A., Yang, Y., and Qi, Y. (2013). “Joint network and node selection for pathway-based genomic data analysis.” Bioinformatics, 29(16): 1987–1996.
  • Zhou, H. and Zheng, T. (2013). “Bayesian hierarchical graph-structured model for pathway analysis using gene expression data.” Statistical applications in genetics and molecular biology, 12(3): 393–412.
  • Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American statistical association, 101(476): 1418–1429.
  • Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.

Supplemental materials

  • Supplementary file 1 for “Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior”. Supplementary materials available at Bayesian Analysis online includes proofs of the theoretical results.
  • Supplementary file 2 for “Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior”. Supplementary materials available at Bayesian Analysis online includes results for real data analysis.