The Annals of Applied Statistics

An exact adaptive test with superior design sensitivity in an observational study of treatments for ovarian cancer

Paul R. Rosenbaum

Full-text: Open access


A sensitivity analysis in an observational study determines the magnitude of bias from nonrandom treatment assignment that would need to be present to alter the qualitative conclusions of a naïve analysis that presumes all biases were removed by matching or by other analytic adjustments. The power of a sensitivity analysis and the design sensitivity anticipate the outcome of a sensitivity analysis under an assumed model for the generation of the data. It is known that the power of a sensitivity analysis is affected by the choice of test statistic, and, in particular, that a statistic with good Pitman efficiency in a randomized experiment, such as Wilcoxon’s signed rank statistic, may have low power in a sensitivity analysis and low design sensitivity when compared to other statistics. For instance, for an additive treatment effect and errors that are Normal or logistic or t-distributed with 3 degrees of freedom, Brown’s combined quantile average test has Pitman efficiency close to that of Wilcoxon’s test but has higher power in a sensitivity analysis, while a version of Noether’s test has poor Pitman efficiency in a randomized experiment but much higher design sensitivity so it is vastly more powerful than Wilcoxon’s statistic in a sensitivity analysis if the sample size is sufficiently large. A new exact distribution-free test is proposed that rejects if either Brown’s test or Noether’s test rejects after adjusting the two critical values so the overall level of the combined test remains at α, conventionally α = 0.05. In every sampling situation, the design sensitivity of the adaptive test equals the larger of the two design sensitivities of the component tests. The adaptive test exhibits good power in sensitivity analyses asymptotically and in simulations. In one sampling situation—Normal errors and an additive effect that is three-quarters of the standard deviation with 500 matched pairs—the power of Wilcoxon’s test in a sensitivity analysis was 2% and the power of the adaptive test was 87%. A study of treatments for ovarian cancer in the Medicare population is discussed in detail.

Article information

Ann. Appl. Stat., Volume 6, Number 1 (2012), 83-105.

First available in Project Euclid: 6 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Brown’s test combined quantile averages design sensitivity Noether’s test observational study randomization inference sensitivity analysis Wilcoxon’s signed rank test


Rosenbaum, Paul R. An exact adaptive test with superior design sensitivity in an observational study of treatments for ovarian cancer. Ann. Appl. Stat. 6 (2012), no. 1, 83--105. doi:10.1214/11-AOAS508.

Export citation


  • Brown, B. M. (1981). Symmetric quantile averages and related estimators. Biometrika 68 235–242.
  • Cochran, W. G. (1965). The planning of observational studies of human populations (with discussion). J. Roy. Statist. Soc. Ser. A 128 234–266.
  • Copas, J. and Eguchi, S. (2001). Local sensitivity approximations for selectivity bias. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 871–895.
  • Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B. and Wynder, E. L. (1959). Smoking and lung cancer. J. Natl. Cancer Inst. 22 173–203.
  • Cox, D. R. (1975). A note on data-splitting for the evaluation of significance levels. Biometrika 62 441–444.
  • Diprete, T. A. and Gangl, M. (2004). Assessing bias in the estimation of causal effects. Sociol. Method. 34 271–310.
  • Egleston, B. L., Scharfstein, D. O. and MacKenzie, E. (2009). On estimation of the survivor average causal effect in observational studies when important confounders are missing due to death. Biometrics 65 497–504.
  • Fisher, R. A. (1935). Design of Experiments. Oliver & Boyd, Edinburgh.
  • Frangakis, C. E. and Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86 365–379.
  • Gadbury, G. L. (2001). Randomization inference and bias of standard errors. Amer. Statist. 55 310–313.
  • Gastwirth, J. L. (1966). On robust procedures. J. Amer. Statist. Assoc. 61 929–948.
  • Gastwirth, J. L. (1992). Methods for assessing the sensitivity of statistical comparisons used in Title VII cases to omitted variables. Jurimetrics 33 19–34.
  • Gilbert, P. B., Bosch, R. J. and Hudgens, M. G. (2003). Sensitivity analysis for the assessment of causal vaccine effects on viral load in HIV vaccine trials. Biometrics 59 531–541.
  • Groeneveld, R. A. (1972). Asymptotically optimal group rank tests for location. J. Amer. Statist. Assoc. 67 847–849.
  • Heller, R., Rosenbaum, P. R. and Small, D. S. (2009). Split samples and design sensitivity in observational studies. J. Amer. Statist. Assoc. 104 1090–1101.
  • Hodges, J. L. Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Statist. 34 598–611.
  • Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future applications and theory (with discussion). J. Amer. Statist. Assoc. 69 909–923.
  • Hosman, C. A., Hansen, B. B. and Holland, P. W. (2010). The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 849–870.
  • Imbens, G. W. (2003). Sensitivity to exogeneity assumptions in program evaluation. Am. Econ. Rev. 93 126–132.
  • Jones, D. H. (1979). An efficient adaptive distribution-free test for location. J. Amer. Statist. Assoc. 74 822–828.
  • Lehmann, E. L. (1975). Nonparametrics. Holden Day, San Francisco.
  • Lin, D. Y., Psaty, B. M. and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54 948–963.
  • Marcus, S. M. (1997). Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect. J. Educ. Behav. Statist. 22 193–201.
  • Maritz, J. S. (1979). A note on exact robust confidence intervals for location. Biometrika 66 163–166.
  • Markowski, E. P. and Hettmansperger, T. P. (1982). Inference based on simple rank step score statistics for the location model. J. Amer. Statist. Assoc. 77 901–907.
  • McCandless, L. C., Gustafson, P. and Levy, A. (2007). Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26 2331–2347.
  • Neyman, J. (1923). On the application of probability theory to agricultural experiments. Statist. Sci. 5 463–480.
  • Noether, G. (1973). Some distribution-free confidence intervals for the center of a symmetric distribution. J. Amer. Statist. Assoc. 68 716–719.
  • Policello, G. E. and Hettmansperger, T. P. (1976). Adaptive robust procedures for the one-sample location problem. J. Amer. Statist. Assoc. 71 624–633.
  • Reiter, J. (2000). Using statistics to determine causal relationships. Amer. Math. Monthly 107 24–32.
  • Rosenbaum, P. R. (1993). Hodges–Lehmann point estimates of treatment effect in observational studies. J. Amer. Statist. Assoc. 88 1250–1253.
  • Rosenbaum, P. R. (2002a). Observational Studies, 2nd ed. Springer, New York.
  • Rosenbaum, P. R. (2002b). Covariance adjustment in randomized experiments and observational studies. Statist. Sci. 17 286–327.
  • Rosenbaum, P. R. (2004). Design sensitivity in observational studies. Biometrika 91 153–164.
  • Rosenbaum, P. R. (2005). Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies. Amer. Statist. 59 147–152.
  • Rosenbaum, P. R. (2010a). Design sensitivity and efficiency in observational studies. J. Amer. Statist. Assoc. 105 692–702.
  • Rosenbaum, P. R. (2010b). Design of Observational Studies. Springer, New York.
  • Rosenbaum, P. R. (2011). A new u-statistic with superior design sensitivity in observational studies. Biometrics 67 1017–1027.
  • Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007). Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. J. Amer. Statist. Assoc. 102 75–83.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B Stat. Methodol. 45 212–218.
  • Rosenbaum, P. R. and Silber, J. H. (2009). Amplification of sensitivity analysis in matched observational studies. J. Amer. Statist. Assoc. 104 1398–1405.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psych. 66 688–701.
  • Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–328.
  • Silber, J. H., Rosenbaum, P. R., Polsky, D., Ross, R. N., Even-Shoshan, O., Schwartz, S., Armstrong, K. A. and Randall, T. C. (2007). Does ovarian cancer treatment and survival differ by the specialty providing chemotherapy? J. Clin. Oncol. 25 1169–1175. Related editorial: 25 1157–1158. Related letters and rejoinders: 25 3551–3558.
  • Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Amer. Statist. Assoc. 102 1049–1058.
  • Volpp, K. G., Rosen, A. K., Rosenbaum, P. R., Romano, P. S., Even-Shoshan, O., Wang, Y., Bellini, L., Behringer, T. and Silber, J. H. (2007). Mortality among hospitalized Medicare beneficiaries in the first 2 years following ACGME resident duty hour reform. J. Am. Med. Assoc. 298 975–983.
  • Wang, L. and Krieger, A. M. (2006). Causal conclusions are most sensitive to unobserved binary covariates. Stat. Med. 25 2257–2271.
  • Welch, B. L. (1937). On the z-test in randomized blocks and Latin squares. Biometrika 29 21–52.
  • Wilk, M. B. and Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika 55 1–17.
  • Wolfe, D. A. (1974). A characterization of population weighted-symmetry and related results. J. Amer. Statist. Assoc. 69 819–822.
  • Yanagawa, T. (1984). Case-control studies: Assessing the effect of a confounding factor. Biometrika 71 191–194.
  • Yu, B. B. and Gastwirth, J. L. (2005). Sensitivity analysis for trend tests: Application to the risk of radiation exposure. Biostatistics 6 201–209.