The Annals of Applied Statistics

A semiparametric approach to mixed outcome latent variable models: Estimating the association between cognition and regional brain volumes

Jonathan Gruhl, Elena A. Erosheva, and Paul K. Crane

Full-text: Open access


Multivariate data that combine binary, categorical, count and continuous outcomes are common in the social and health sciences. We propose a semiparametric Bayesian latent variable model for multivariate data of arbitrary type that does not require specification of conditional distributions. Drawing on the extended rank likelihood method by Hoff [Ann. Appl. Stat. 1 (2007) 265–283], we develop a semiparametric approach for latent variable modeling with mixed outcomes and propose associated Markov chain Monte Carlo estimation methods. Motivated by cognitive testing data, we focus on bifactor models, a special case of factor analysis. We employ our semiparametric Bayesian latent variable model to investigate the association between cognitive outcomes and MRI-measured regional brain volumes.

Article information

Ann. Appl. Stat., Volume 7, Number 4 (2013), 2361-2383.

First available in Project Euclid: 23 December 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Latent variable model Bayesian hierarchical model extended rank likelihood cognitive outcomes


Gruhl, Jonathan; Erosheva, Elena A.; Crane, Paul K. A semiparametric approach to mixed outcome latent variable models: Estimating the association between cognition and regional brain volumes. Ann. Appl. Stat. 7 (2013), no. 4, 2361--2383. doi:10.1214/13-AOAS675.

Export citation


  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • Bartholomew, D., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis: A Unified Approach, 3rd ed. Wiley, Chichester.
  • Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley, New York.
  • Cardenas, V. A., Ezekiel, F., Di Sclafani, V., Gomberg, B. and Fein, G. (2001). Reliability of tissue volumes and their spatial distribution for segmented magnetic resonance images. Psychiatry Research: Neuroimaging 106 193–205.
  • Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research 1 245–276.
  • Chui, H. C. (2007). Subcortical ischemic vascular dementia. Neurol. Clin. 25 717–740, vi.
  • Chui, H. C., Zarow, C., Mack, W. J., Ellis, W. G., Zheng, L., Jagust, W. J., Mungas, D., Reed, B. R., Kramer, J. H., DeCarli, C. C. et al. (2006). Cognitive impact of subcortical vascular and Alzheimer’s disease pathology. Annals of Neurology 60 677.
  • Congdon, P. (2003). Applied Bayesian Modelling. Wiley, Chichester.
  • Congdon, P. (2006). Bayesian Statistical Modelling, 2nd ed. Wiley, Chichester.
  • Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 969–993.
  • Dunn, J. E. (1973). A note on a sufficiency condition for uniqueness of restricted factor matrix. Psychometrika 38 141–143.
  • Dunson, D. B. (2003). Dynamic latent trait models for multidimensional longitudinal data. J. Amer. Statist. Assoc. 98 555–563.
  • Dunson, D. B. et al. (2006). Efficient Bayesian model averaging in factor analysis. Technical report, Duke Univ., Durham, NC.
  • Erosheva, E. and Curtis, S. M. (2011). Specification of rotational constraints in Bayesian confirmatory factor analysis. Technical Report No. 589, Univ. Washington, Seattle, WA.
  • Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics, 4 (PeñíScola, 1991) (J. M. Bernardo, J. Berger, A. P. Dawid and J. F. M. Smith, eds.) 169–193. Oxford Univ. Press, New York.
  • Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. Review of Financial Studies 9 557–587.
  • Ghosh, J. and Dunson, D. B. (2008). Bayesian model selection in factor analytic models. In Random effect and latent variable model selection 151–163. Springer, New York.
  • Ghosh, J. and Dunson, D. B. (2009). Default prior distributions and efficient posterior computation in Bayesian factor analysis. J. Comput. Graph. Statist. 18 306–320.
  • Gruhl, J., Erosheva, E. and Crane, P. (2010). Analyzing cognitive testing data with extensions of item response theory models. Presented at the Joint Statistical Meetings, Vancouver, Canada, August 3, 2010.
  • Gruhl, J., Erosheva, E. and Crane, P. (2011). A semiparametric Bayesian latent trait model for multivariate mixed type data. In International Meeting of the Pyschometric Society.
  • Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika 19 149–161.
  • Hachinski, V., Iadecola, C., Petersen, R. C., Breteler, M. M., Nyenhuis, D. L., Black, S. E., Powers, W. J., DeCarli, C., Merino, J. G., Kalaria, R. N. et al. (2006). National institute of neurological disorders and stroke—Canadian stroke network vascular cognitive impairment harmonization standards. Stroke 37 2220–2241.
  • Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 1 265–283.
  • Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer, New York.
  • Holzinger, K. J. and Swineford, F. (1937). The bi-factor method. Psychometrika 2 41–54.
  • Jennrich, R. I. (1978). Rotational equivalence of factor loading matrices with specified values. Psychometrika 43 421–426.
  • Jennrich, R. I. and Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika 76 537–549.
  • Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34 183–202.
  • Klüppelberg, C. and Kuhn, G. (2009). Copula structure analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 737–753.
  • Knowles, D. and Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5 1534–1552.
  • Kuczynski, B., Targan, E., Madison, C., Weiner, M., Zhang, Y., Reed, B., Chui, H. C. and Jagust, W. (2010). White matter integrity and cortical metabolic associations in aging and dementia. Alzheimer’s and Dementia 6 54–62.
  • Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 755–770.
  • Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264–1274.
  • Loken, E. (2005). Identification constraints and inference in factor models. Struct. Equ. Model. 12 232–244.
  • Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41–67.
  • Millsap, R. E. (2001). When trivial constraints are not trivial: The choice of uniqueness constraints in confirmatory factor analysis. Struct. Equ. Model. 8 1–17.
  • Morris, J. C. (1993). The Clinical Dementia Rating (CDR): Current version and scoring rules. Neurology 43 2412–2414.
  • Morris, J. C. (1997). Clinical dementia rating: A reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int. Psychogeriatr. 9 Suppl 1 173–176; discussion 177–178.
  • Moustaki, I. and Knott, M. (2000). Generalized latent trait models. Psychometrika 65 391–411.
  • Mungas, D., Harvey, D., Reed, B. R., Jagust, W. J., DeCarli, C., Beckett, L., Mack, W. J., Kramer, J. H., Weiner, M. W., Schuff, N. et al. (2005). Longitudinal volumetric MRI change and rate of cognitive decline. Neurology 65 565–571.
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Appl. Psychol. Meas. 16 159.
  • Murray, J. S., Dunson, D. B., Carin, L. and Lucas, J. E. (2013). Bayesian Gaussian copula factor models for mixed data. J. Amer. Statist. Assoc. 108 656–665.
  • Pettitt, A. N. (1982). Inference for the linear model using a likelihood based on ranks. J. R. Stat. Soc. Ser. B Stat. Methodol. 44 234–243.
  • Raftery, A. E. and Lewis, S. M. (1995). The number of iterations, convergence diagnostics and generic Metropolis algorithms. In Practical Markov Chain Monte Carlo (W. R. Gilks, D. J. Spiegelhalter and S. Richardson, eds.). Chapman & Hall, London, UK.
  • Rai, P. and Daumé III, H. (2009). The infinite hierarchical factor regression model. Available at arXiv:0908.0570.
  • Reise, S. P., Morizot, J. and Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual. Life Res. 16 Suppl 1 19–31.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement 34 1–100.
  • Sammel, M. D., Ryan, L. M. and Legler, J. M. (1997). Latent variable models for mixed discrete and continuous outcomes. J. R. Stat. Soc. Ser. B Stat. Methodol. 59 667–678.
  • Shi, J. Q. and Lee, S. Y. (1998). Bayesian sampling-based approach for factor analysis models with continuous and polytomous data. British J. Math. Statist. Psych. 51 233–252.
  • Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Chapman & Hall/CRC, Boca Raton, FL.
  • Stephens, M. (2000). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 795–809.
  • van der Linden, W. J. and Hambleton, R. K., eds. (1997). Handbook of Modern Item Response Theory. Springer, New York.
  • West, M. (1987). On scale mixtures of normal distributions. Biometrika 74 646–648.