The Annals of Applied Statistics

Cox regression with exclusion frequency-based weights to identify neuroimaging markers relevant to Huntington’s disease onset

Tanya P. Garcia and Samuel Müller

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Biomedical studies of neuroimaging and genomics collect large amounts of data on a small subset of subjects so as to not miss informative predictors. An important goal is identifying those predictors that provide better visualization of the data and that could serve as cost-effective measures for future clinical trials. Identifying such predictors is challenging, however, when the predictors are naturally interrelated and the response is a failure time prone to censoring. We propose to handle these challenges with a novel variable selection technique. Our approach casts the problem into several smaller dimensional settings and extracts from this intermediary step the relative importance of each predictor through data-driven weights called exclusion frequencies. The exclusion frequencies are used as weights in a weighted Lasso, and results yield low false discovery rates and a high geometric mean of sensitivity and specificity. We illustrate the method’s advantages over existing ones in an extensive simulation study, and use the method to identify relevant neuroimaging markers associated with Huntington’s disease onset.

Article information

Source
Ann. Appl. Stat., Volume 10, Number 4 (2016), 2130-2156.

Dates
Received: September 2015
Revised: July 2016
First available in Project Euclid: 5 January 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1483606854

Digital Object Identifier
doi:10.1214/16-AOAS967

Mathematical Reviews number (MathSciNet)
MR3592051

Zentralblatt MATH identifier
06688771

Keywords
Exclusion frequency model selection neuroimaging proportional hazards model weighted lasso

Citation

Garcia, Tanya P.; Müller, Samuel. Cox regression with exclusion frequency-based weights to identify neuroimaging markers relevant to Huntington’s disease onset. Ann. Appl. Stat. 10 (2016), no. 4, 2130--2156. doi:10.1214/16-AOAS967. https://projecteuclid.org/euclid.aoas/1483606854


Export citation

References

  • Aylward, E. H. (2007). Change in MRI striatal volumes as a biomarker in preclinical Huntington’s disease. Brain Res. Bull. 72 152–158.
  • Aylward, E. H., Nopoulos, P. C., Ross, C. A., Langbehn, D., Pierson, R. K., Mills, J. A., Johnson, H., Magnotta, V., Juhl, A., Paulsen, J. S. and the PREDICT-HD Investigators and Coordinators of the Huntington Study Group (2011). Longitudinal change in regional brain volumes in prodromal Huntington disease. J. Neurol. Neurosurg. Psychiatry 82 405–410.
  • Aylward, E. H., Liu, D., Nopoulos, P. C., Ross, C. A., Pierson, R. K., Mills, J. A., Long, J. D., Paulsen, J. S. and the PREDICT-HD Investigators, and Coordinators of the Huntington Study Group (2012). Striatal volume contributes to the prediction of onset of Huntington disease in incident cases. Biological Psychiatry 71 822–828. PMID: 21907324, PMCID, PMC3237730.
  • Bach, F. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland. 2008.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Bergersen, L. C., Glad, I. K. and Lyng, H. (2011). Weighted lasso with data integration. Stat. Appl. Genet. Mol. Biol. 10 Art. 39, 31.
  • Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics 53 603–619.
  • Chen, C. H. and George, S. L. (1985). The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat. Med. 4 39–46.
  • Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34 187–220.
  • Huntington’s Disease Collaborative Research Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72 971–983.
  • Fan, J. and Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist. 30 74–99.
  • Faraggi, D. and Simon, R. (1998). Bayesian variable selection method for censored survival data. Biometrics 54 1475–1485.
  • Garcia, T. P. and Müller, S. (2014). Influence of measures of significance based weights in the weighted lasso. J. Indian Soc. Agricultural Statist. 68 131–144.
  • Garcia, T. P., Müller, S., Carroll, R. J., Dunn, T. N., Thomas, A. P., Adams, S. H., Pillai, S. D. and Walzem, R. L. (2013). Structured variable selection with $q$-values. Biostatistics 14 695–707.
  • Garcia, T. P., Müller, S., Carroll, R. J. and Walzem, R. L. (2014). Identification of important regressor groups, subgroups and individuals via regularization methods: Application to gut microbiome data. Bioinformatics 30 831–837.
  • Georgiou-Karistianis, N., Scahill, R., Tabrizi, S. J., Squitieri, F. and Aylward, E. (2013). Structural MRI in Huntington’s disease and recommendations for its potential use in clinical trials. Neurosci. Biobehav. Rev. 37 480–490.
  • Gong, G. D. (1982). Cross-validation, the jacknife, and the bootstrap: Excess error estimation in forward logistic regression. Technical Report 192, Dept. of Statistics, Stanford Univ., 1–82.
  • Gong, G. (1986). Cross-validation, the jacknife, and the bootstrap: Excess error estimation in forward logistic regression. J. Amer. Statist. Assoc. 81 108–113.
  • Hicks, S., Rosas, H. D., Berna, C., Scahill, R., Durmas, E., Roos, R. A. et al. (2010). PAW36 oculomotor deficits in presymptomatic and early Huntington’s disease and their structural brain correlates. J. Neurol. Neurosurg. Psychiatry 81 e33.
  • Hobbs, N. Z., Barnes, J., Frost, C., Henley, S. M. D., Wild, E. J., Macdonald, K., Barker, R. A., Scahill, R. I., Fox, N. C. and Tabrizi, S. J. (2010). Onset and progression of pathologic atrophy in Huntington disease: A longitudinal MR imaging study. Am. J. Neuroradiol. 31 1036–1041.
  • Ibrahim, J. G., Chen, M.-H. and MacEachern, S. N. (1999). Bayesian variable selection for proportional hazards models. Canad. J. Statist. 27 701–717.
  • Jurgens, C. K., Van De Wiel, L., Van Es, A. C. G. M., Grimbergen, Y. M., Witjes-Ane, M. N. W., Van Der Grond, J. et al. (2008). Basal ganglia volume and clinical correlates in ‘pre-clinical’ Huntington’s disease. J. Neurol. 255 1785–1791.
  • Kubat, M., Holte, R. C. and Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30 195–215.
  • Langbehn, D. R., Brinkman, R. R., Falush, D., Paulsen, J. S., Hayden, M. R. and International Huntington’s Disease Collaborative Group (2004). A new model for prediction of the age of onset and penetrance for Huntington’s disease based on CAG length. Clin. Genet. 65 267–277.
  • Lin, W. and Lv, J. (2013). High-dimensional sparse additive hazards regression. J. Amer. Statist. Assoc. 108 247–264.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • Müller, S. and Welsh, A. H. (2005). Outlier robust model selection in linear regression. J. Amer. Statist. Assoc. 100 1297–1310.
  • Müller, S. and Welsh, A. H. (2009). Robust model selection in generalized linear models. Statist. Sinica 19 1155–1170.
  • Müller, S. and Welsh, A. H. (2010). On model selection curves. Int. Stat. Rev. 78 240–256.
  • Paulsen, J. S., Langbehn, D. R., Stout, J. C., Aylward, E., Ross, C. A., Nance, M., Guttman, M., Johnson, S., McDonald, M., Beglinger, L. J., Duff, K., Kayson, E., Biglan, K., Shoulson, I., Oakes, D., Hayden, M. and Coordinators of the Huntington Study Group (2008). Detection of Huntington’s disease decades before diagnosis: The Predict HD study. J. Neurol. Neurosurg. Psychiatry 79 874–880.
  • Paulsen, J. S., Nopoulos, P. C., Aylward, E., Ross, C. A., Johnson, H., Magnotta, V. A., Juhl, A., Pierson, R. K., Mills, J., Langbehn, D. and Nance, M. (2010). Striatal and white matter predictors of estimated diagnosis for Huntington disease. Brain Res. Bull. 82 201–207.
  • Ross, C. A. and Tabrizi, S. J. (2010). Huntington’s disease: From molecular pathogenesis to clinical treatment. Lancet Neurol. 10 83–98.
  • Ross, C. A., Pantelyat, A., Kogan, J. and Brandt, J. (2014). Determinants of functional disability in Huntington’s disease: Role of cognitive and motor dysfunction. Mov. Disord. 29 1351–1358.
  • Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building: Application to the Cox regression model. Stat. Med. 11 2093–2109.
  • Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 55–80.
  • Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39 1–13.
  • Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231–245.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445.
  • Tabrizi, S. J., Reilmann, R., Roos, R. A. C., Durr, A., Leavitt, B., Owen, G., Jones, R., Johnson, H., Craufurd, D., Hicks, S. L., Kennard, C., Landwehrmeyer, B., Stout, J. C., Borowsky, B., Scahill, R. I., Frost, C., Langbehn, D. R. and TRACK-HD investigators (2012). Potential endpoints for clinical trials in premanifest and early Huntington’s disease in the TRACK-HD study: Analysis of 24 month observational data. Lancet Neurol. 11 42–53.
  • Tabrizi, S. J., Scahill, R. I., Owen, G., Durr, A., Leavitt, B. R., Roos, R. A., Borowsky, B., Landwehrmeyer, B., Frost, C., Johnson, H., Craufurd, D., Reilmann, R., Stout, J. C., Langbehn, D. R. and TRACK-HD Investigators (2013). Predictors of phenotypic progression and disease onset in premanifest and early-stage Huntington’s disease in the TRACK-HD study: Analysis of 36-month observational data. Lancet Neurol. 12 637–649.
  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Stat. Med. 16 385–395.
  • Wang, S., Nan, B., Rosset, S. and Zhu, J. (2011). Random Lasso. Ann. Appl. Stat. 5 468–485.
  • Wassef, S. N., Wemmie, J., Johnson, C. P., Johnson, H., Paulsen, J. S., Long, J. D. and Magnotta, V. A. (2015). T1$\rho$ imaging in premanifest Huntington disease reveals changes associated with disease progression. Mov. Disord. 30 1107–1114.
  • Witten, D. M. and Tibshirani, R. (2010). Survival analysis with high-dimensional covariates. Stat. Methods Med. Res. 19 29–51.
  • Younes, L., Ratnanather, J. T., Brown, T., Aylward, E., Nopoulos, P., Johnson, H., Magnotta, V. A., Paulsen, J. S., Margolis, R. L., Albin, R. L., Miller, M. I. and Ross, C. A. (2014). Regionally selective atrophy of subcortical structures in prodromal HD as revealed by statistical shape analysis. Hum. Brain Mapp. 35 792–809.
  • Yu, B. (2013). Stability. Bernoulli 19 1484–1500.
  • Zhang, H. H. and Lu, W. (2007). Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94 691–703.
  • Zhang, Y., Long, J. D., Mills, J. A., Warner, J. H., Lu, W., Paulsen, J. S. and the PREDICT-HD Investigators of the Huntington Study Group, C. (2011). Indexing disease progression at study entry with individuals at-risk for Huntington disease. Am. J. Med. Genet., Part B Neuropsychiatr. Genet. 156B 751–763.