Annals of Applied Statistics

Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge?

Brian Francis, Regina Dittrich, and Reinhold Hatzinger

Full-text: Open access


This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated.

An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data.

Article information

Ann. Appl. Stat., Volume 4, Number 4 (2010), 2181-2202.

First available in Project Euclid: 4 January 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Ranked data random effects NPML paired comparisons Bradley–Terry model latent class analysis mixture of experts Eurobarometer


Francis, Brian; Dittrich, Regina; Hatzinger, Reinhold. Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge?. Ann. Appl. Stat. 4 (2010), no. 4, 2181--2202. doi:10.1214/10-AOAS366.

Export citation


  • Aitkin, M. (1994). An EM algorithm for overdispersion in generalised linear models. In Proceedings of the 9th International Workshop on Statistical Modelling (J. Hinde, ed.) 1–8. Exeter University.
  • Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statist. Comput. 6 251–262.
  • Aitkin, M. and Aitkin, I. (1996). A hybrid EM/Gauss–Newton algorithm for maximum likelihood in mixture distributions. Statist. Comput. 6 127–130.
  • Böckenholt, U. (2001a). Hierarchical modelling of paired comparison data. Psychological Methods 6 49–66.
  • Böckenholt, U. (2001b). Mixed-effects analyses of rank ordered data. Psychometrika 66 45–62.
  • Bradley, R. and Terry, M. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
  • Busse, L., Orbanz, P. and Buhmann, J. (2007). Cluster analysis of heterogeneous rank data. In Proceedings of the 24th International Conference on Machine Learning 113–120. ACM Press, New York.
  • Chapman, R. and Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. J. Marketing Res. 19 288–301.
  • Christensen, T. (2001). Eurobarometer 55.2: Europeans, science and technology. Technical report, European Opinion Research Group, Commission of the European Communities, Brussels.
  • Coull, B. and Agresti, A. (2000). Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics 56 73–80.
  • Critchlow, D. and Fligner, M. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation in GLIM. Psychometrika 56 517–533.
  • Critchlow, D. and Fligner, M. (1993). Ranking models with item variables. In Probability Models and Statistical Analyses for Ranking Data (M. Fligner and J. Verducci, eds.). Lecture Notes in Statistics 80 1–19. Springer, New York.
  • Croon, M. (1989). Latent class models for the analysis of rankings. In New Developments in Psychological Choice Modelling (G. De Soete and J. S. Klauer, ed.) 99–121. Elsevier, Amsterdam.
  • Dabic, M. and Hatzinger, R. (2009). Zielgruppenadäquate Abläufe in Konfigurationssystemen—eine empirische studie im automobilmarkt: Das paarvergleichs-pattern-modell füer partial rankings. In Präferenzanalyse Mit R (R. Hatzinger, R. Dittrich and T. Salzberger, eds.) 119–150. Facultas, Wien.
  • D’Elia, A. and Piccolo, D. (2005). A mixture model for preferences data analysis. Comput. Statist. Data Anal. 49 917–934.
  • Dietz, E. and Böhning, D. (1995). Statistical inference based on a general model of unobserved heterogeneity. In Statistical Modelling: Proceedings of the 10th International Workshop (G. U. H. Seeber, B. J. Francis, R. Hatzinger and G. Steckel-Berger, eds.). Lecture Notes in Statistics 104 75–82. Springer, New York.
  • Dittrich, R., Francis, B., Hatzinger, R. and Katzenbeisser, W. (2007). A paired comparison approach for the analysis of sets of Likert scale responses. Statist. Model. 7 3–28.
  • Dittrich, R., Francis, B., Hatzinger, R. and Katzenbeisser, W. (2010). Missing observations in paired comparison data. Under revision.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Appl. Statist. 47 511–525.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (2004). A log-linear approach for modelling ordinal paired comparison data on motives to start a phd programme. Statist. Model. 4 181–193.
  • Dittrich, R., Katzenbeisser, W. and Reisinger, H. (2000). The analysis of rank ordered preference data based on Bradley–Terry type models. OR Spektrum 22 117–134.
  • Einbeck, J., Darnell, R. and Hinde, J. (2007). npmlreg: Nonparametric maximum likelihood estimation for random effect models. R package version 0.43.
  • Fligner, M. and Verducci, J. (1988). Multistage ranking models. J. Amer. Statist. Assoc. 83 892–901.
  • Fligner, M. and Verducci, J. (1993). Probability Models and Statistical Analyses for Ranking Data. Springer Lecture Notes in Statistics 80. Springer, New York.
  • Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. J. Amer. Statist. Assoc. 87 476–486.
  • Francis, B., Dittrich, R. and Hatzinger, R. (2010). Supplement to “Modeling heterogeneity in ranked responses by non-parametric maximum likelihood: How do Europeans get their scientific knowledge?.” DOI: 10.1214/10-AOAS366SUPP.
  • Francis, B., Dittrich, R., Hatzinger, R. and Penn, R. (2002). Analysing ranks using paired comparison methods: An investigation of value orientation in Europe. Appl. Statist. 51 319–336.
  • Gormley, I. and Murphy, T. (2008a). A mixture of experts model for rank data with applications in election studies. Ann. Appl. Statist. 2 1452–1477.
  • Gormley, I. and Murphy, T. (2008b). Exploring voting blocs within Irish electorate. J. Amer. Statist. Assoc. 103 1014–1027.
  • Hartigan, J. and Kleiner, B. (1984). A mosaic of television ratings. Amer. Statist. 38 32–35.
  • Hartzel, J., Agresti, A. and Caffo, B. (2001). Multinomial logit random effects models. Statist. Model. 1 81–102.
  • Hatzinger, R. (2009). prefmod: Utilities to fit paired comparison models for preferences. R package version 0.8-17.
  • Hatzinger, R. and Francis, B. (2004). Fitting paired comparison models in R. Technical Report 3, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien.
  • Kamakura, W. and Mazzon, J. (1991). Value segmentation? A model for the measurement of values and value systems. J. Consumer Res. 18 208–218.
  • Lancaster, J. F. and Quade, D. (1983). Random effects in paired-comparison experiments using the Bradley–Terry model. Biometrics 39 245–249.
  • Mallet, A. (1986). A maximum likelihood estimation method for random coefficient regression models. Biometrika 73 654–656.
  • Mallows, C. (1957). Non-null ranking models: I. Biometrika 44 114–130.
  • Matthews, J. and Morris, K. (1995). An application of Bradley–Terry-type models to the measurement of pain. Appl. Statist. 44 243–255.
  • McLachlan, G., Peel, D., Basford, K. and Adams, P. (1999). The EMMIX software for the fitting of mixtures of Normal and t-components. Technical report, Department of Mathematics, University of Queensland.
  • R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Sheskin, D. (2007). Handbook of Parametric and Nonparametric Statistical Procedures. Chapman and Hall, London.

Supplemental materials

  • Supplementary material: The EM algorithm for NPML random effects in ranked data. We provide a detailed description of the use of the EM algorithm for fitting nonparametric random effects for ranked data by maximum likelihood.