The Annals of Applied Statistics

Improving covariate balance in 2K factorial designs via rerandomization with an application to a New York City Department of Education High School Study

Zach Branson, Tirthankar Dasgupta, and Donald B. Rubin

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


A few years ago, the New York Department of Education (NYDE) was planning to conduct an experiment involving five new intervention programs for a selected set of New York City high schools. The goal was to estimate the causal effects of these programs and their interactions on the schools’ performance. For each of the schools, about 50 premeasured covariates were available. The schools could be randomly assigned to the 32 treatment combinations of this $2^{5}$ factorial experiment, but such an allocation could have resulted in a huge covariate imbalance across treatment groups. Standard methods used to prevent confounding of treatment effects with covariate effects (e.g., blocking) were not intuitive due to the large number of covariates. In this paper, we explore how the recently proposed and studied method of rerandomization can be applied to this problem and other factorial experiments. We propose how to implement rerandomization in factorial experiments, extend the theoretical properties of rerandomization from single-factor experiments to $2^{K}$ factorial designs, and demonstrate, using the NYDE data, how such a designed experiment can improve precision of estimated factorial effects.

Article information

Ann. Appl. Stat., Volume 10, Number 4 (2016), 1958-1976.

Received: April 2016
Revised: June 2016
First available in Project Euclid: 5 January 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Experimental design treatment allocation randomization Mahalanobis distance factorial effects


Branson, Zach; Dasgupta, Tirthankar; Rubin, Donald B. Improving covariate balance in 2 K factorial designs via rerandomization with an application to a New York City Department of Education High School Study. Ann. Appl. Stat. 10 (2016), no. 4, 1958--1976. doi:10.1214/16-AOAS959.

Export citation


  • Ahluwalia, J. S., Okuyemi, K., Nollen, N., Choi, W. S., Kaur, H., Pulvers, K. and Mayo, M. S. (2006). The effects of nicotine gum and counseling among African American light smokers: A $2\times 2$ factorial design. Addiction 101 883–891.
  • Apfel, C. C., Kranke, P., Katz, M. H., Goepfert, C., Papenfuss, S., Rauch, S., Heineck, R., Greim, C. A. and Roewer, R. (2002). Volatile anaesthetics may be the main cause of early but not delayed postoperative vomiting: A randomized controlled trial of factorial design. Br. J. Anaesth. 88 659–668.
  • Bays, H. E., Ose, L., Fraser, N., Tribble, D. L., Quinto, K., Reyes, R., Johnson-Levonas, A. O., Sapre, A., Donahue, S. R. and Ezetimibe Study Group (2004). A multicenter, randomized, double-blind, placebo-controlled, factorial design study to evaluate the lipid-altering efficacy and safety profile of the ezetimibe/simvastatin tablet compared with ezetimibe and simvastatin monotherapy in patients with primary hypercholesterolemia. Clin. Ther. 26 1758–1773.
  • Box, G. E. P., Hunter, J. S. and Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery, 2nd ed. Wiley, Hoboken, NJ.
  • Branson, Z., Dasgupta, T. and Rubin, D. B. (2016). Supplement to “Improving covariate balance in $2^{K}$ factorial designs via rerandomization with an application to a New York City Department of Education High School Study.” DOI:10.1214/16-AOAS959SUPP.
  • Bruhn, M. and McKenzie, D. (2009). In pursuit of balance: Randomization in practice in development field experiments. Am. Econ. J. Appl. Econ. 1 200–232.
  • Cox, D. R. (2009). Randomization in the design of experiments. Int. Stat. Rev. 77 415–429.
  • Dasgupta, T., Pillai, N. S. and Rubin, D. B. (2015). Causal inference from $2^{K}$ factorial designs by using potential outcomes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 727–753.
  • Espinosa, V., Dasgupta, T. and Rubin, D. B. (2016). A Bayesian perspective on the analysis of unreplicated factorial experiments using potential outcomes. Technometrics 58 62–73.
  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.
  • Fisher, R. A. (1942). The Design of Experiments, 3rd ed. ed. Hafner-Publishing, New York.
  • Gu, X. S. and Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. J. Comput. Graph. Statist. 2 405–420.
  • Hu, Y. and Hu, F. (2012). Asymptotic properties of covariate-adaptive randomization. Ann. Statist. 40 1794–1815.
  • Kasari, C., Rotheram-Fuller, E., Locke, J. and Gulsrud, A. (2012). Making the connection: Randomized controlled trial of social skills at school for children with autism spectrum disorders. J. Child Psychol. Psychiatry 53 431–439.
  • Kollar, I., Fischer, F. and Slotta, J. D. (2005). Internal and external collaboration scripts in web-based science learning at schools. In Proceedings of the 2005 Conference on Computer Support for Collaborative Learning: Learning 2005: The Next 10 Years! CSCL ’05, Taipei, Taiwan, May 30–June 4, 2005. 331–340. International Society of the Learning Sciences.
  • Krause, M. S. and Howard, K. I. (2003). What random assignment does and does not do. Journal of Clinical Psychology 59 751–766.
  • Lindley, D. (1982). The role of randomization in inference. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 2 431–446.
  • Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta) 2 49–55.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
  • Morgan, K. L. and Rubin, D. B. (2012). Rerandomization to improve covariate balance in experiments. Ann. Statist. 40 1263–1282.
  • Morgan, K. L. and Rubin, D. B. (2015). Rerandomization to balance tiers of covariates. J. Amer. Statist. Assoc. 110 1412–1421.
  • Morris, C. (1979). A finite selection model for experimental design of the Health Insurance study. J. Econometrics 11 43–61.
  • Papineau, D. (1994). The virtues of randomization. British J. Philos. Sci. 45 437–450, 712–715.
  • Ravaud, P., Giraudeau, B., Logeart, I., Larguier, J. S., Rolland, D., Treves, R., Euller-Ziegler, L., Bannwarth, B. and Dougados, M. (2004). Management of osteoarthritis (OA) with an unsupervised home based exercise programme and/or patient administered assessment tools. A cluster randomised controlled trial with a $2\times 2$ factorial design. Ann. Rheum. Dis. 63 703–708.
  • Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist. 39 33–38.
  • Rosenberger, W. F. and Sverdlov, O. (2008). Handling covariates in the design of clinical trials. Statist. Sci. 23 404–419.
  • Rubin, D. B. (1976). Multivariate matching methods that are equal percent bias reducing. I. Some examples. Biometrics 32 109–120.
  • Rubin, D. B. (2008). Comment: The design and analysis of gold standard randomized experiments [MR2655714]. J. Amer. Statist. Assoc. 103 1350–1353.
  • Rubin, D. B. and Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates. J. Amer. Statist. Assoc. 95 573–585.
  • Seidenfeld, T. (1982). Levi on the Dogma of Randomization in Experiments (H. E. Kyburg, Jr. and I. Levi, eds.) 263–291. Springer, Berlin.
  • Worrall, J. (2010). Evidence: Philosophy of science meets medicine. J. Eval. Clin. Pract. 16 356–362.
  • Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis, and Optimization, 2nd ed. Wiley, Hoboken, NJ.
  • Xu, Z. and Kalbfleisch, J. D. (2013). Repeated randomization and matching in multi-arm trials. Biometrics 69 949–959.
  • Yates, F. (1937). The design and analysis of factorial experiments. Imperial Bureau of Soil Sciences—Technical Communication. No. 35, Harpenden.

Supplemental materials

  • Dataset and R Code for “Improving covariate balance in 2K factorial designs via rerandomization with an application to a New York City Department of Education High School Study.”. We provide the NYDE dataset discussed in the paper, as well as the R code used to implement the rerandomization algorithm for this dataset.