The Annals of Applied Statistics

Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence

Heping Zhang, Dungang Liu, Jiwei Zhao, and Xuan Bi

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid traits.Although the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.

Article information

Ann. Appl. Stat., Volume 12, Number 4 (2018), 2359-2378.

Received: September 2016
Revised: January 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Comorbidity association EM algorithm latent variable ordinal outcome


Zhang, Heping; Liu, Dungang; Zhao, Jiwei; Bi, Xuan. Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence. Ann. Appl. Stat. 12 (2018), no. 4, 2359--2378. doi:10.1214/18-AOAS1156.

Export citation


  • Anderson, J. A. and Pemberton, J. D. (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. Biometrics 41 875–885.
  • Boscardin, W. J., Zhang, X. and Belin, T. R. (2008). Modeling a mixture of ordinal and continuous repeated measures. J. Stat. Comput. Simul. 78 873–886.
  • Chen, X., Cho, K., Singer, B. and Zhang, H. (2011). The nuclear transcription factor PKNOX2 is a candidate gene for substance dependence in European-origin women. PLoS ONE 6 e16002.
  • de Leon, A. R. (2005). Pairwise likelihood approach to grouped continuous model and its extension. Statist. Probab. Lett. 75 49–57.
  • de Leon, A. R. and Carrière, K. C. (2007). General mixed-data model: Extension of general location and grouped continuous models. Canad. J. Statist. 35 533–548.
  • de Leon, A. R. and Carrière, K. C. (2013). Analysis of Mixed Data: Methods and Applications. Chapman and Hall/CRC, Boca Raton, FL.
  • Ferreira, M. A. and Purcell, S. M. (2009). A multivariate test of association. Bioinformatics 25 132–133.
  • Galesloot, T. E., van Steen, K., Kiemeney, L. A. L. M., Janss, L. L. and Vermeulen, S. H. (2014). A comparison of multivariate genome-wide association methods. PLoS ONE 9 e95923.
  • He, Q., Avery, C. L. and Lin, D.-Y. (2013). A general framework for association tests with multivariate traits in large-scale genomics studies. Genet. Epidemiol. 37 759–767.
  • He, J., Li, H., Edmondson, A. C., Rader, D. J. and Li, M. (2012). A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies. Biostatistics 13 497–508.
  • Jiang, Y., Li, N. and Zhang, H. (2014). Identifying genetic variants for addiction via propensity score adjusted generalized Kendall’s tau. J. Amer. Statist. Assoc. 109 905–930.
  • Kawakatsu, H. and Largey, A. G. (2009). EM algorithms for ordered probit models with endogenous regressors. Econom. J. 12 164–186.
  • Kim, D. K., Cho, M. H., Hersh, C. P., Lomas, D. A., Miller, B. E., Kong, X., Bakke, P., Gulsvik, A., Agustí, A., Wouters, E. et al. (2012). Genome-wide association analysis of blood biomarkers in chronic obstructive pulmonary disease. Am. J. Respir. Crit. Care Med. 186 1238–1247.
  • Klei, L., Luca, D., Devlin, B. and Roeder, K. (2008). Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet. Epidemiol. 32 9–19.
  • Kwak, M., Zheng, G. and Wu, C. O. (2013). Joint tests for mixed traits in genetic association studies. In Analysis of Mixed Data 31–41. CRC Press, Boca Raton, FL.
  • Laird, N. M. and Lange, C. (2011). The Fundamentals of Modern Statistical Genetics. Springer, New York.
  • Lange, C., Silverman, E. K., Xu, X., Weiss, S. T. and Laird, N. M. (2003). A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4 195–206.
  • Lenz, G., Davis, R., Ngo, V., Lam, L., George, T., Wright, G., Dave, S., Zhao, H., Xu, W., Rosenwald, A., Ott, G., Muller-Hermelink, H., Gascoyne, R., Connors, J., Rimsza, L., Campo, E., Jaffe, E., Delabie, J., Smeland, E., Fisher, R., Chan, W. and Staudt, L. (2008). Oncogenic CARD11 mutations in human diffuse large B cell lymphoma. Science 319 1676–1679.
  • Li, M. D. and Burmeister, M. (2009). New insights into the genetics of addiction. Nat. Rev. Genet. 10 225–231.
  • Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 755–770.
  • Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
  • O’Reilly, P. F., Hoggart, C. J., Pomyen, Y., Calboli, F. C. F., Elliott, P., Jarvelin, M.-R. and Coin, L. J. M. (2012). MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7 e34861.
  • Poon, W.-Y. and Lee, S. Y. (1987). Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients. Psychometrika 52 409–430.
  • Ruud, P. A. (1991). Extensions of estimation methods using the EM algorithm. J. Econometrics 49 305–341.
  • Yang, Q., Wu, H., Guo, C.-Y. and Fox, C. S. (2010). Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet. Epidemiol. 34 444–454.
  • Zhang, H. (2011). Statistical analysis in genetic studies of mental illnesses. Statist. Sci. 26 116–129.
  • Zhang, H., Liu, C.-T. and Wang, X. (2010). An association test for multiple traits based on the generalized Kendall’s tau. J. Amer. Statist. Assoc. 105 473–481.
  • Zhang, H., Liu, D., Zhao, J. and Bi, X. (2018). Supplement to “Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence.” DOI:10.1214/18-AOAS1156SUPP.
  • Zhao, J. and Zhang, H. (2016). Modeling multiple responses via bootstrapping margins with an application to genetic association testing. Stat. Interface 9 47–56.
  • Zhu, W., Jiang, Y. and Zhang, H. (2012). Nonparametric covariate-adjusted association tests based on the generalized Kendall’s tau. J. Amer. Statist. Assoc. 107 1–11.
  • Zhu, W. and Zhang, H. (2009). Why do we test multiple traits in genetic association studies? J. Korean Statist. Soc. 38 1–10.

Supplemental materials

  • Supplement to “Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence”. Supplementary materials provide all the technical details for the model fitting.