The Annals of Applied Statistics

The role of mastery learning in an intelligent tutoring system: Principal stratification on a latent variable

Adam C. Sales and John F. Pane

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Students in Algebra I classrooms typically learn at different rates and struggle at different points in the curriculum—a common challenge for math teachers. Cognitive Tutor Algebra I (CTA1), an educational computer program, addresses such student heterogeneity via what they term “mastery learning,” where students progress from one section of the curriculum to the next by demonstrating appropriate “mastery” at each stage. However, when students are unable to master a section’s skills even after trying many problems, they are automatically promoted to the next section anyway. Does promotion without mastery impair the program’s effectiveness?

At least in certain domains, CTA1 was recently shown to improve student learning on average in a randomized effectiveness study. This paper uses student log data from that study in a continuous principal stratification model to estimate the relationship between students’ potential mastery and the CTA1 treatment effect. In contrast to extant principal stratification applications, a student’s propensity to master worked sections here is never directly observed. Consequently we embed an item-response model, which measures students’ potential mastery, within the larger principal stratification model. We find that the tutor may, in fact, be more effective for students who are more frequently promoted (despite unsuccessfully completing sections of the material). However, since these students are distinctive in their educational strength (as well as in other respects), it remains unclear whether this enhanced effectiveness can be directly attributed to aspects of the mastery learning program.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 420-443.

Received: July 2017
Revised: June 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference principal stratification item response theory latent variables Bayesian educational technology


Sales, Adam C.; Pane, John F. The role of mastery learning in an intelligent tutoring system: Principal stratification on a latent variable. Ann. Appl. Stat. 13 (2019), no. 1, 420--443. doi:10.1214/18-AOAS1196.

Export citation


  • Anderson, J. R., Boyle, C. F. and Reiser, B. J. (1985). Intelligent tutoring systems. Science 228 456–462.
  • Anderson, J. R., Corbett, A. T., Koedinger, K. R. and Pelletier, R. (1995). Cognitive tutors: Lessons learned. J. Learn. Sci. 4 167–207.
  • Bates, D., Mächler, M., Bolker, B. and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 1–48.
  • Beck, J. E. and Gong, Y. (2013). Wheel-spinning: Students who fail to master a skill. In International Conference on Artificial Intelligence in Education 431–440. Springer, Berlin.
  • Bloom, B. S. (1968). Learning for mastery. Instruction and curriculum. Regional Education Laboratory for the Carolinas and Virginia, topical papers and reprints, number 1. Eval. Comment 1 n2.
  • Bowers, J., Fredrickson, M. and Hansen, B. (2017). RItools: Randomization inference tools (development version). R package version 0.2-0. Available at
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Monographs on Statistics and Applied Probability 105. Chapman & Hall/CRC, Boca Raton, FL.
  • De Boeck, P. and Wilson, M., eds. (2013). Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. Springer, New York.
  • Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. J. Amer. Statist. Assoc. 68 117–130.
  • Embretson, S. E. and Reise, S. P. (2013). Item Response Theory for Psychologists. Psychology Press, London.
  • Feller, A., Greif, E., Miratrix, L. and Pillai, N. (2016a) Principal stratification in the twilight zone: Weakly separated components in finite mixture models. Preprint. Available at arXiv:1602.06595.
  • Feller, A., Grindal, T., Miratrix, L. and Page, L. C. (2016b). Compared to what? Variation in the impacts of early childhood education by alternative care type. Ann. Appl. Stat. 10 1245–1285.
  • Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58 21–29.
  • Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge Univ. Press, Cambridge.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statist. Sinica 6 733–807.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, Vol. 2, 3rd ed. CRC Press, Boca Raton, FL.
  • Gilbert, P. B. and Hudgens, M. G. (2008). Evaluating candidate principal surrogate endpoints. Biometrics 64 1146–1154.
  • Griffin, B. A., McCaffrey, D. F. and Morral, A. R. (2008). An application of principal stratification to control for institutionalization at follow-up in studies of substance abuse treatment programs. Ann. Appl. Stat. 2 1034–1055.
  • Imbens, G. W. and Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. Ann. Statist. 25 305–327.
  • Israni, A., Sales, A. C. and Pane, J. F. (2018). Mastery learning in practice: A (mostly) descriptive analysis of log data from the cognitive tutor algebra I effectiveness trial. Preprint. Available at arXiv:1802.08616.
  • Jin, H. and Rubin, D. B. (2008). Principal stratification for causal inference with extended partial compliance. J. Amer. Statist. Assoc. 103 101–111.
  • Kalton, G. (1968). Standardization: A technique to control for extraneous variables. J. R. Stat. Soc. Ser. C. Appl. Stat. 17 118–136.
  • Kulik, C.-L. C., Kulik, J. A. and Bangert-Drowns, R. L. (1990). Effectiveness of mastery learning programs: A meta-analysis. Rev. Educ. Res. 60 265–299.
  • Levy, R., Mislevy, R. J. and Sinharay, S. (2009). Posterior predictive model checking for multidimensionality in item response theory. Appl. Psychol. Meas. 33 519–537.
  • Li, F., Mattei, A. and Mealli, F. (2015). Evaluating the causal effect of university grants on student dropout: Evidence from a regression discontinuity design using principal stratification. Ann. Appl. Stat. 9 1906–1931.
  • Little, R. J. A. and Rubin, D. B. (2014). Statistical Analysis with Missing Data, Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken, NJ.
  • Mattei, A., Li, F. and Mealli, F. (2013). Exploiting multiple outcomes in Bayesian principal stratification analysis with application to the evaluation of a job training program. Ann. Appl. Stat. 7 2336–2360.
  • Miratrix, L., Furey, J., Feller, A., Grindal, T. and Page, L. C. (2017). Bounding, an accessible method for estimating principal causal effects, examined and explained. J. Res. Educ. Eff. 11 133–162.
  • Neyman, J. (1923). Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Rocz. Nauk Rol. 10 1–51.
  • Nolen, T. L. and Hudgens, M. G. (2011). Randomization-based inference within principal strata. J. Amer. Statist. Assoc. 106 581–593.
  • Page, L. C. (2012). Principal stratification as a framework for investigating mediational processes in experimental settings. J. Res. Educ. Eff. 5 215–244.
  • Pane, J. F., Griffin, B. A., McCaffrey, D. F. and Karam, R. (2014). Effectiveness of cognitive tutor algebra I at scale. Educ. Eval. Policy Anal. 36 127–144.
  • R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at
  • Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Nielson and Lydiche, Copenhagen.
  • Richardson, T. S., Evans, R. J. and Robins, J. M. (2011). Transparent parametrizations of models for potential outcomes. In Bayesian Statistics 9 569–610. Oxford Univ. Press, Oxford.
  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Rubin, D. B. (1980). Discussion of “Randomization analysis of experimental data: The Fisher randomization test.” J. Amer. Statist. Assoc. 75 591–593.
  • Rubin, D. B. (1981). Estimation in parallel randomized experiments. J. Educ. Stat. 6 377–401.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Sales, A. C. and Pane, J. F. (2019). Supplement to “The role of mastery learning in an intelligent tutoring system: Principal stratification on a latent variable.” DOI:10.1214/18-AOAS1196SUPP.
  • Sales, A. C., Wilks, A. and Pane, J. F. (2016). Student usage predicts treatment effect heterogeneity in the cognitive tutor algebra I program. In Proceedings of the 9th International Conference on Educational Data Mining. International Educational Data Mining Society 207–214.
  • Schwartz, S. L., Li, F. and Mealli, F. (2011). A Bayesian semiparametric approach to intermediate variables in causal inference. J. Amer. Statist. Assoc. 106 1331–1344.
  • Stan Development Team (2016). RStan: The R interface to Stan. R package version 2.14.1. Available at
  • Stekhoven, D. J. and Buehlmann, P. (2012). Missforest—Non-parametric missing value imputation for mixed-type data. Bioinformatics 28 112–118.
  • Sterne, J. A. C., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M. and Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BJM 338 b2393.
  • van der Linden, W. J. and Hambleton, R. K., eds. (2013). Handbook of Modern Item Response Theory. Springer, New York.
  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. J. Educ. Meas. 30 187–213.
  • Zhu, X. and Stone, C. A. (2011). Assessing fit of unidimensional graded response models using Bayesian methods. J. Educ. Meas. 48 81–97.

Supplemental materials

  • Supplement to “The role of mastery learning in intelligent tutoring systems: Principal stratification on a latent variable”. We provide modeling details, Stan code, and an extensive set of model goodness-of-fit and sensitivity analyses and plots.