## Statistical Science

### Contemporary Frequentist Views of the $2\times2$ Binomial Trial

#### Abstract

The $2\times2$ table is the simplest of data structures yet it is of immense practical importance. It is also just complex enough to provide a theoretical testing ground for general frequentist methods. Yet after 70 years of debate, its correct analysis is still not settled. Rather than recount the entire history, our review is motivated by contemporary developments in likelihood and testing theory as well as computational advances. We will look at both conditional and unconditional tests. Within the conditional framework, we explain the relationship of Fisher’s test with variants such as mid-$p$ and Liebermeister’s test, as well as modern developments in likelihood theory, such as $p^{*}$ and approximate conditioning. Within an unconditional framework, we consider four modern methods of correcting approximate tests to properly control size by accounting for the unknown value of the nuisance parameter: maximisation (M), partial maximisation (B), estimation (E) and estimation followed by maximisation ($\mbox{E}+\mbox{M}$). Under the conditional model, we recommend Fisher’s test. For the unconditional model, amongst standard approximate methods, Liebermeister’s tests come closest to controlling size. However, our best recommendation is the E procedure applied to the signed root likelihood statistic, as this performs very well in terms of size and power and is easily computed. We support our assertions with a numerical study.

#### Article information

Source
Statist. Sci., Volume 32, Number 4 (2017), 600-615.

Dates
First available in Project Euclid: 28 November 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1511838030

Digital Object Identifier
doi:10.1214/17-STS627

Mathematical Reviews number (MathSciNet)
MR3730524

Zentralblatt MATH identifier
1384.62115

#### Citation

Ripamonti, Enrico; Lloyd, Chris; Quatto, Piero. Contemporary Frequentist Views of the $2\times2$ Binomial Trial. Statist. Sci. 32 (2017), no. 4, 600--615. doi:10.1214/17-STS627. https://projecteuclid.org/euclid.ss/1511838030

#### References

• Agresti, A. (1992). A survey of exact inference for contingency tables. Statist. Sci. 7 131–177. With comments and a rejoinder by the author.
• Agresti, A. (2001). Exact inference for categorica data: Recent advances and continuing controversies. Stat. Med. 20 2709–2722.
• Agresti, A. (2002). Categorical Data Analysis. Wiley, Hoboken, NJ.
• Barnard, G. A. (1945). A new test for $2\times2$ tables. Nature 156 177.
• Barnard, G. A. (1947). Significance tests for $2\times2$ tables. Biometrika 34 123–138.
• Barndorff-Nielsen, O. (1973). On $M$-ancillarity. Biometrika 60 447–455.
• Barndorff-Nielsen, O. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343–365.
• Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355–366.
• Berger, R. L. and Boos, D. D. (1994). $P$ values maximized over a confidence set for the nuisance parameter. J. Amer. Statist. Assoc. 89 1012–1016.
• Berger, R. L. and Sidik, K. (2003). Exact unconditional tests for a $2\times2$ matched-pairs design. Stat. Methods Med. Res. 12 91–108.
• Berkson, J. (1978). In dispraise of the exact test: Do the marginal totals of the $2\times2$ table contain relevant information respecting the table proportions? J. Statist. Plann. Inference 2 27–42.
• Boschloo, R. D. (1970). Raised conditional level of significance for the $2\times2$-table when testing the equality of two probabilities. Stat. Neerl. 24 1–35.
• Brown, L. D., Cai, T. T. and DasGupta, A. (2001). Interval estimation for a binomial proportion. Statist. Sci. 16 101–133.
• Brown, L. D., Cai, T. T. and DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Statist. 30 160–201.
• Choi, L., Blume, J. D. and Dupont, W. D. (2015). Elucidating the foundations of statistical inference with $2\times2$ tables. PLoS ONE 10 e0121263.
• Cox, D. R. (1980). Local ancillarity. Biometrika 67 279–286.
• Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
• Davison, A. C., Fraser, D. A. S. and Reid, N. (2006). Improved likelihood inference for discrete data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 495–508.
• Finner, H. and Strassburger, K. (2002). Structural properties of UMPU-tests for $2\times2$ tables and some applications. J. Statist. Plann. Inference 104 103–120.
• Fisher, R. A. (1935). The Design of Experiments, 1st ed. Oliver and Boyd, London.
• Gail, M. H. and Gart, J. J. (1973). The determination of sample sizes for use with the exact conditional test in $2\times2$ comparative trials. Biometrics 29 441–448.
• Gart, J. J. (1969). An exact test for comparing matched proportions in crossover designs. Biometrika 56 75–80.
• Godambe, V. P. (1980). On sufficiency and ancillarity in the presence of a nuisance parameter. Biometrika 67 155–162.
• Harris, B. and Soms, A. P. (1991). Theory and counterexamples for confidence limits on system reliability. Statist. Probab. Lett. 11 411–417.
• Hirji, K. F. (2006). Exact Analysis of Discrete Data. Chapman & Hall/CRC, Boca Raton, FL.
• Hirji, K. F., Mehta, C. R. and Patel, N. R. (1988). Exact inference for matched case-control studies. Biometrics 44 803–814.
• Hirji, K. F., Tan, S. J. and Elashoff, R. M. (1991). A quasi-exact test for comparing two binomial proportions. Stat. Med. 10 1137–1153.
• Howard, J. V. (1998). The $2\times2$ table: A discussion from a Bayesian viewpoint. Statist. Sci. 13 351–367.
• Hwang, J. T. G. and Yang, M.-C. (2001). An optimality theory for mid $p$-values in $2\times2$ contingency tables. Statist. Sinica 11 807–826.
• Kabaila, P. (2005). Computation of exact confidence limits from discrete data. Comput. Statist. 20 401–414.
• Kabaila, P. and Lloyd, C. J. (2006). Improved Buehler limits based on refined designated statistics. J. Statist. Plann. Inference 136 3145–3155.
• Lancaster, H. O. (1961). Signifiance tests in discrete distributions. J. Amer. Statist. Assoc. 56 223–234.
• Lehmann, E. L. (1959). Testing Statistical Hypotheses. Wiley, New York.
• Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
• Liebermeister, C. (1877). Über Wahrscheinlichkeitsrechnung in Anwendung Auf Therapeutische Statistik. Breitkiof and Härtel.
• Lloyd, C. J. (2008a). A new exact and more powerful unconditional test of no treatment effect from binary matched pairs. Biometrics 64 716–723.
• Lloyd, C. J. (2008b). Exact $P$-values for discrete models obtained by estimation and maximization. Aust. N. Z. J. Stat. 50 329–345.
• Lloyd, C. J. (2010a). $P$-values based on approximate conditioning and $p^{*}$. J. Statist. Plann. Inference 140 1073–1081.
• Lloyd, C. J. (2010b). Bootstrap and second-order tests of risk difference. Biometrics 66 975–982.
• Lloyd, C. J. (2012). Computing highly accurate or exact $P$-values using importance sampling. Comput. Statist. Data Anal. 56 1784–1794.
• Lydersen, S., Fagerland, M. W. and Laake, P. (2009). Recommended tests for association in $2\times2$ tables. Stat. Med. 28 1159–1175.
• Martin Andres, A. (1991). A review of classic non-asymptotic methods for comparing two proportions by means of independent samples. Comm. Statist. Simulation Comput. 20 551–583.
• McDonald, L. L., Davis, B. M. and Milliken, G. A. (1977). A nonrandomized unconditional test for comparing two proportions in $2\times2$ contingency tables. Technometrics 19 145–158.
• Mehrotra, D. V., Chan, I. S. F. and Berger, R. L. (2003). A cautionary note on exact unconditional inference for a difference between two independent binomial proportions. Biometrics 59 441–450.
• Mehta, C. R. and Hilton, J. F. (1993). Exact power of conditional and unconditional tests: Going beyond the $2\times2$ contingency table. Amer. Statist. 47 91–98.
• Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 5 157–175.
• Pierce, D. A. and Peters, D. (1992). Practical use of higher order asymptotics for multiparameter exponential families. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 701–737.
• Pierce, D. A. and Peters, D. (1999). Improving on exact tests by approximate conditioning. Biometrika 86 265–277.
• Ripamonti, E., Lloyd, C. and Quatto, P. (2017). Supplement to “Contemporary frequentist views of the $2\times2$ binomial trial.” DOI:10.1214/17-STS627SUPP.
• Röhmel, J. (2005). Problems with existing procedures to calculate exact unconditional $p$-values for non-inferiority/superiority and confidence intervals for two binomials and how to resolve them. Biom. J. 47 37–47.
• Röhmel, J. and Mansmann, U. (1999). Unconditional non-asymptotic one-sided tests for independent binomial proportions when the interest lies in showing non-inferiority and/or superiority. Biom. J. 41 149–170.
• Seneta, E. and Phipps, M. C. (2001). On the comparison of two observed frequencies. Biom. J. 43 23–43.
• Skipka, G., Munk, A. and Freitag, G. (2004). Unconditional exact tests for the difference of binomial probabilities—contrasted and compared. Comput. Statist. Data Anal. 47 757–773.
• Storer, B. E. and Kim, C. (1990). Exact properties of some exact test statistics for comparing two binomial proportions. J. Amer. Statist. Assoc. 85 146–155.
• Tocher, K. D. (1950). Extension of the Neyman–Pearson theory of tests to discontinuous variates. Biometrika 37 130–144.
• Wells, M. T. (2010). Optimality results for mid $p$-values. In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown. Inst. Math. Stat. (IMS) Collect. 6 184–198. IMS, Beachwood, OH.
• Yates, F. (1984). Tests of significance for $2\times2$ contingency tables. J. Roy. Statist. Soc. Ser. A 147 426–463.

#### Supplemental materials

• Supplement to “Contemporary Frequentist Views of the $2\times2$ Binomial Trial”. We provide formulas for standard approximate statistics and adjusted p-values. We illustrate in detail the numerical study.