Statistical Science

A Closer Look at Testing the “No-Treatment-Effect” Hypothesis in a Comparative Experiment

Joseph B. Lang

Full-text: Open access


Standard tests of the “no-treatment-effect” hypothesis for a comparative experiment include permutation tests, the Wilcoxon rank sum test, two-sample $t$ tests, and Fisher-type randomization tests. Practitioners are aware that these procedures test different no-effect hypotheses and are based on different modeling assumptions. However, this awareness is not always, or even usually, accompanied by a clear understanding or appreciation of these differences. Borrowing from the rich literatures on causality and finite-population sampling theory, this paper develops a modeling framework that affords answers to several important questions, including: exactly what hypothesis is being tested, what model assumptions are being made, and are there other, perhaps better, approaches to testing a no-effect hypothesis? The framework lends itself to clear descriptions of three main inference approaches: process-based, randomization-based, and selection-based. It also promotes careful consideration of model assumptions and targets of inference, and highlights the importance of randomization. Along the way, Fisher-type randomization tests are compared to permutation tests and a less well-known Neyman-type randomization test. A simulation study compares the operating characteristics of the Neyman-type randomization test to those of the other more familiar tests.

Article information

Statist. Sci., Volume 30, Number 3 (2015), 352-371.

First available in Project Euclid: 10 August 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal effects completely randomized design finite-population sampling theory Fisher vs. Neyman Fisher’s exact test Horvitz–Thompson estimator nonmeasurable probability sample permutation tests potential variables process-based inference randomization-based inference randomization tests selection-based inference


Lang, Joseph B. A Closer Look at Testing the “No-Treatment-Effect” Hypothesis in a Comparative Experiment. Statist. Sci. 30 (2015), no. 3, 352--371. doi:10.1214/15-STS513.

Export citation


  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Agresti, A. and Franklin, C. (2007). Statistics: The Art and Science of Learning from Data. Pearson/Prentice Hall, Upper Saddle River, NJ.
  • Bailey, R. A. (1981). A unified approach to design of experiments. J. Roy. Statist. Soc. Ser. A 144 214–223.
  • Copas, J. B. (1973). Randomization models for the matched and unmatched $2\times 2$ tables. Biometrika 60 467–476.
  • Cox, D. R. (1958a). The interpretation of the effects of non-additivity in the latin square. Biometrika 45 69–73.
  • Cox, D. R. (1958b). Planning of Experiments. Wiley, New York.
  • Cox, D. R. (2009). Randomization in the design of experiments. Int. Stat. Rev. 77 415–429.
  • Cox, D. R. and Reid, N. (2000). The Theory of the Design of Experiments. Chapman & Hall/CRC, Boca Raton, FL.
  • David, H. A. (2008). The beginnings of randomization tests. Amer. Statist. 62 70–72.
  • Eden, T. and Yates, F. (1933). On the validity of Fisher’s $z$ test when applied to an actual example of non-normal data. J. Agric. Sci. 23 6–17.
  • Ernst, M. D. (2004). Permutation methods: A basis for exact inference. Statist. Sci. 19 676–685.
  • Fisher, R. A. (1935). The Design of Experiments. Oliver Boyd, Edinburgh.
  • Gadbury, G. L. (2001). Randomization inference and bias of standard errors. Amer. Statist. 55 310–313.
  • Greenland, S. (1991). On the logical justification of conditional tests for two-by-two contingency tables. Amer. Statist. 45 248–251.
  • Greenland, S. (2000). Causal analysis in the health sciences. J. Amer. Statist. Assoc. 95 286–289.
  • Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945–970.
  • Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
  • Kempthorne, O. (1952). The Design and Analysis of Experiments. Wiley, New York.
  • Kempthorne, O. (1955). The randomization theory of experimental inference. J. Amer. Statist. Assoc. 50 946–967.
  • Kempthorne, O. (1977). Why randomize? J. Statist. Plann. Inference 1 1–25.
  • Lehmann, E. L. (1994). Jerzy Neyman, 1894–1981: A biographical memoir. In Biographical Memoirs, Vol. 63, Edited by Office of the Home Secretary. National Academy of Sciences, Washington, DC.
  • Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essay on principles. Section 9. Roczniki Nauk Rolniczych Tom X [in Polish]; English translation of excerpts by D. M. Dabrowska and T. P. Speed Statist. Sci. 5 (1990) 463–472.
  • Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive sampling (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 97 558–625.
  • Neyman, J., Iwaskiewicz, K. and Kolodziejczyk, S. (1935). Statistical problems in agricultural experimentation (with discussion). Suppl. J. Roy. Statist. Soc. 2 107–180.
  • Pitman, E. J. G. (1937). Significance tests which can be applied to samples from any populations. Suppl. J. Roy. Statist. Soc. 4 119–130.
  • Pitman, E. J. G. (1938). Significance tests which can be applied to samples from any populations. III. The analysis of variance test. Biometrika 29 322–335.
  • Rosenbaum, P. R. (2014). Available at
  • Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci. 10 (1923) 1–51]. Statist. Sci. 5 472–480.
  • Rubin, D. B. (2004). Teaching statistical inference for causal effects in experiments and observational studies. J. Educ. and Behav. Statist. 29 343–367.
  • Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc. 100 322–331.
  • Rubin, D. B. (2010). Reflections stimulated by the comments of Shadish and West and Thoemmes. Psychological Methods 15 38–46.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Strayer, D. L. and Johnston, W. A. (2001). Driven to distraction: Dual-task studies of simulated driving and conversing on a cellular telephone. Psychological Science 12 462–466.
  • Sutter, G., Zyskind, G. and Kemphorne, O. (1963). Some Aspects of Constrained Randomization ARL Report 63-18, Wright-Patterson AFB, Ohio.
  • Welch, B. L. (1937). On the $z$-test in randomized blocks and latin squares. Biometrika 29 21–52.
  • Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika 29 350–62.
  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin 1 80–83.