The Annals of Applied Statistics

Bayesian analysis of dynamic item response models in educational testing

Xiaojing Wang, James O. Berger, and Donald S. Burdick

Full-text: Open access


Item response theory (IRT) models have been widely used in educational measurement testing. When there are repeated observations available for individuals through time, a dynamic structure for the latent trait of ability needs to be incorporated into the model, to accommodate changes in ability. Other complications that often arise in such settings include a violation of the common assumption that test results are conditionally independent, given ability and item difficulty, and that test item difficulties may be partially specified, but subject to uncertainty. Focusing on time series dichotomous response data, a new class of state space models, called Dynamic Item Response (DIR) models, is proposed. The models can be applied either retrospectively to the full data or on-line, in cases where real-time prediction is needed. The models are studied through simulated examples and applied to a large collection of reading test data obtained from MetaMetrics, Inc.

Article information

Ann. Appl. Stat., Volume 7, Number 1 (2013), 126-153.

First available in Project Euclid: 9 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

IRT models local dependence random effects dynamic linear models Gibbs sampling forward filtering and backward sampling


Wang, Xiaojing; Berger, James O.; Burdick, Donald S. Bayesian analysis of dynamic item response models in educational testing. Ann. Appl. Stat. 7 (2013), no. 1, 126--153. doi:10.1214/12-AOAS608.

Export citation


  • Albers, W., Does, R. J. M. M., Imbos, T. and Janssen, M. P. E. (1989). A stochastic growth model applied to repeated tests of academic knowledge. Psychometrika 54 451–466.
  • Andersen, E. B. (1970). Asymptotic properties of conditional maximum-likelihood estimators. J. Roy. Statist. Soc. Ser. B 32 283–301.
  • Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. J. Roy. Statist. Soc. Ser. B 36 99–102.
  • Andrich, D. and Kreiner, S. (2010). Quantifying response dependence between two dichotomous items using the rasch model. Appl. Psychol. Meas. 34 181–192.
  • Bartolucci, F., Pennoni, F. and Vittadini, G. (2011). Assessment of school performance through a multilevel latent Markov Rash model. Journal of Educational and Behavioral Statistics 36 491–522.
  • Berger, J. (2006). The case for objective Bayesian analysis. Bayesian Anal. 1 385–402.
  • Bock, R. D. and Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika 35 179–197.
  • Bradlow, E. T., Wainer, H. and Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika 64 153–168.
  • Carlin, B. P. and Polson, N. G. (1992). Monte Carlo Bayesian methods for discrete regression models and categorical time series. Bayesian Statistics 4 577–586.
  • Czado, C. and Song, P. X. K. (2008). State space mixed models for longitudinal observations with binary and binomial responses. Statist. Papers 49 691–714.
  • De Boeck, P. (2008). Random item IRT models. Psychometrika 73 533–559.
  • Emberetson, S. E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika 56 495–515.
  • Fahrmeir, L. (1992). Posterior mode estimation by extended Kalman filtering for multivariate dynamic generalized linear models. J. Amer. Statist. Assoc. 87 501–509.
  • Gibbons, R. D. and Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika 57 423–436.
  • Hanlon, S. T., Swartz, C. W., Stenner, A. J., Burdick, H. J. and Burdick, D. S. (2010). Oasis literacy research platform. V1, [Software]. Available from MetaMetrics, Inc.
  • Jannarone, R. J. (1986). Conjunctive item response theory kernels. Psychometrika 51 357–373.
  • Johnson, C. and Raudenbush, S. W. (2006). A repeated measures, multilevel rasch model with application to self-reported criminal behavior. In Methodological Issues in Aging Research 131–164. Routledge, New York, NY.
  • Lord, F. M. (1953). The relation of test score to the trait underlying the test. Educational Psychology Measurement 13 517–548.
  • Martin, A. D. and Quinn, K. M. (2002). Dynamic ideal point estimation via Markov chain Monte Carlo for the U.S. supreme count, 1953–1999. Political Analysis 10 134–152.
  • Park, J. H. (2011). Modeling preference changes via a hidden Markov item response theory model. In Handbook of Markov Chain Monte Carlo 479–491. CRC Press, Boca Raton, FL.
  • Rasche, G. (1961). On general laws and the meaning of measurement in psychology. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. IV 321–333. Univ. California Press, Berkeley, CA.
  • Sinharay, S., Johnson, M. S. and Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics 28 295–313.
  • Stenner, A. J. (2010). Using technology to merge assessment and instruction. In 2nd International Conference for Teaching and Learning with Technology. Available from MetaMetrics, Inc.
  • Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika 52 589–617.
  • Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika 55 293–325.
  • Tan, E. S., Ambergen, A. W., Does, R. J. M. M. and Imbos, T. (1999). Approximations of normal IRT models for change. Journal of Educational and Behavioral Statistics 24 208–223.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G. and Van Damme, J. (2006). Application of multidimensional item response theory models to longitudinal data. Educ. Psychol. Meas. 66 5–34.
  • Wang, X. (2012). Bayesian modelling using latent structures. Ph.D. thesis, Duke Univ.
  • West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. Springer, New York.