The Annals of Applied Statistics

Experimental designs for multiple-level responses, with application to a large-scale educational intervention

Brenda Jenney and Sharon Lohr

Full-text: Open access


Educational research often studies subjects that are in naturally clustered groups of classrooms or schools. When designing a randomized experiment to evaluate an intervention directed at teachers, but with effects on teachers and their students, the power or anticipated variance for the treatment effect needs to be examined at both levels. If the treatment is applied to clusters, power is usually reduced. At the same time, a cluster design decreases the probability of contamination, and contamination can also reduce power to detect a treatment effect. Designs that are optimal at one level may be inefficient for estimating the treatment effect at another level. In this paper we study the efficiency of three designs and their ability to detect a treatment effect: randomize schools to treatment, randomize teachers within schools to treatment, and completely randomize teachers to treatment. The three designs are compared for both the teacher and student level within the mixed model framework, and a simulation study is conducted to compare expected treatment variances for the three designs with various levels of correlation within and between clusters. We present a computer program that study designers can use to explore the anticipated variances of treatment effects under proposed experimental designs and settings.

Article information

Ann. Appl. Stat., Volume 3, Number 2 (2009), 691-709.

First available in Project Euclid: 22 June 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Anticipated variance contamination hierarchical design multilevel response randomization


Jenney, Brenda; Lohr, Sharon. Experimental designs for multiple-level responses, with application to a large-scale educational intervention. Ann. Appl. Stat. 3 (2009), no. 2, 691--709. doi:10.1214/08-AOAS216.

Export citation


  • Berk, R. A., Ladd, H., Graziano, H. and Baek, J.-H. (2003). A randomized experiment testing inmate classification systems. Criminology & Public Policy 2 215–242.
  • CRESMET, Arizona State University (2007). Project Pathways (MSP). Available at
  • Bloom, H., Bos, J. M. and Lee, S.-W. (1999). Using cluster random assignment to measure program impacts: Statistical implications for the evaluation of education programs. Evaluation Review 23 445–469.
  • Bloom, H., Richburg-Hayes, L. and Black, A. R. (2005). Using covariates to improve precision: Empirical guidance for studies that randomize schools to measure the impacts of educational interventions. MDRC Working Papers on Research Methodology. Available at
  • Borman, G. D., Slavin, R. E., Cheung, A., Chamberlain, A. M., Madden, N. A. and Chambers, B. (2005). Success for all: First-year results from the national randomized field trial. Educational Evaluation and Policy Analysis 27 1–22.
  • Boruch, R. (2002). The virtues of randomness. Education Next Fall 37–41.
  • Chuang, J.-H., Hripcsak, G. and Heitjan, D. (2002). Design and analysis of controlled trials in naturally clustered environments: Implications for medical informatics. Journal of the American Medical Informatics Association 9 230–238.
  • Cook, T. D. (2003). Why have educational evaluators chosen not to do randomized experiments? Annals of the American Academy of Political and Social Science 589 114–149.
  • Cook, T. D. (2005). Emergent principles for the design, implementation, and analysis of cluster-based experiments in social science. Annals of the American Academy of Political and Social Science 599 176–198.
  • Cook, T. D. and Payne, M. R. (2002). Objecting to the objections to using random assignment in educational research. In Evidence Matters: Randomized Trials in Education Research (F. Mosteller and R. Boruch, eds.) 150–178. Brookings Institution Press, Washington, DC.
  • Cox, D. and Reid, N. (2000). The Theory of the Design of Experiments. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Demidenko, E. (2004). Mixed Models. Wiley, Hoboken, NJ.
  • Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B. and Pee, D. (1996). On design considerations and randomization-based inference for community intervention trials. Stat. Med. 15 1069–1092.
  • Gueron, J. (2005). Throwing good money after bad: A common error misleads foundations and policymakers. Stanford Social Innovation Review Fall 69–71.
  • Hoff, P. D. (2003). Random effects models for network data. Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers. Available at
  • Jenney, B. and Lohr, S. (2008a). Supplement to “Experimental designs for multiple-level responses, with application to a large-scale educational intervention.” DOI: 10.1214/08-AOAS216SUPPA.
  • Jenney, B. and Lohr, S. (2008b). Supplement to “Experimental designs for multiple-level responses, with application to a large-scale educational intervention.” DOI: 10.1214/08-AOAS216SUPPB.
  • Jo, B. (2002). Statistical power in randomized intervention studies with noncompliance. Psychological Methods 7 178–193.
  • Johnson, T. (1998). Clinical trials in psychiatry: Background and statistical perspective. Stat. Methods Med. Res. 7 209–234.
  • Liu, X., Spybrook, J., Congdon, R. and Raudenbush, S. (2006). Optimal design software for multi-level and longitudinal research v1.77. Available at
  • McCaffrey, D. F., Koretz, D., Louis, T. A. and Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics 29 67–101.
  • Moerbeek, M. (2005). Randomization of clusters versus randomization of persons within clusters: Which is preferable? Amer. Statist. 59 72–78.
  • Moerbeek, M., van Breukelen, G. J. P. and Berger, M. P. F. (2000). Design issues for experiments in multilevel populations. Journal of Educational and Behavioral Statistics 25 271–284.
  • Raudenbush, S. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods 2 173–185.
  • Raudenbush, S. and Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods 5 199–213.
  • SAS Institute Inc. (2008). SAS/STAT 9.2 user’s guide. SAS Institute Inc., Cary, NC.
  • Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge Univ. Press, Cambridge.
  • What Works Clearinghouse (2006). Evidence standards for reviewing studies. Available at

Supplemental materials