Statistical Science

Randomization Does Not Justify Logistic Regression

David A. Freedman

Source: Statist. Sci. Volume 23, Number 2 (2008), 237-249.

Abstract

The logit model is often used to analyze experimental data. However, randomization does not justify the model, so the usual estimators can be inconsistent. A consistent estimator is proposed. Neyman’s non-parametric setup is used as a benchmark. In this setup, each subject has two potential responses, one if treated and the other if untreated; only one of the two responses can be observed. Beside the mathematics, there are simulation results, a brief review of the literature, and some recommendations for practice.

Keywords: Models; randomization; logistic regression; logit; average predicted probability

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1219339115
Digital Object Identifier: doi:10.1214/08-STS262

References

Amemiya, T. (1981). Qualitative response models: A survey. J. Economic Literature 19 1483–1536.
Amemiya, T. (1985). Advanced Econometrics. Harvard Univ. Press.
Aris, E. M. D., Hagenaars, J. A. P., Croon, M. and Vermunt, J. K. (2000). The use of randomization for logit and logistic models. In Proceedings of the Fifth International Conference on Social Science Methodology (J. Blasius, J. Hox, E. de Leuw and P. Smidt, eds.). TT Publications, Cologne.
Berk, R. A. (2004). Regression Analysis: A Constructive Critique. Sage, Thousand Oaks, CA.
Berkson, J. (1944). Application of the logistic function to bio-assay. J. Amer. Statist. Assoc. 39 357–365.
Brady, H. E. and Collier, D. (2004). Rethinking Social Inquiry: Diverse Tools, Shared Standards. Rowman & Littlefield, Lanham, MD.
Brant, R. (1996). Digesting logistic regression results. The American Statistician 50 117–119.
Chrystal, G. (1889). Algebra: An Elementary Text Book for the Higher Classes of Secondary Schools and for Colleges. Part II. Adam and Charles Black, Edinburgh. Available on Google Scholar 7/28/07.
Dabrowska, D. M. and Speed, T. P. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 456–480.
Mathematical Reviews (MathSciNet): MR1092986
Project Euclid: euclid.ss/1177012031
de Moivre, A. (1697). A method of raising an infinite multinomial to any given power, or extracting any given root of the same. Philos. Trans. Roy. Soc. London 19 619–625.
Duch, R. M. and Palmer, H. D. (2004). It’s not whether you win or lose, but how you play the game. American Political Science Review 98 437–452.
Ducharme, G. R. and Lepage, Y. (1986). Testing collapsibility in contingency tables. J. Roy. Statist. Soc. Ser. B 48 197–205.
Mathematical Reviews (MathSciNet): MR867997
Evans, W. N. and Schwab, R. M. (1995). Finishing high school and starting college: Do Catholic schools make a difference? Quarterly J. Economics 110 941–974.
Freedman, D. A. (2005). Statistical Models: Theory and Practice. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2175838
Freedman, D. A. (2006a). Statistical models for causation: What inferential leverage do they provide? Evaluation Review 30 691–713.
Freedman, D. A. (2006b). On the so-called “Huber Sandwich Estimator” and “robust standard errors.” Amer. Statist. 60 299–302.
Mathematical Reviews (MathSciNet): MR2291297
Digital Object Identifier: doi:10.1198/000313006X152207
Freedman, D. A. (2008a). On regression adjustments to experimental data. Adv. in Appl. Math. 40 180–193.
Mathematical Reviews (MathSciNet): MR2388610
Digital Object Identifier: doi:10.1016/j.aam.2006.12.003
Freedman, D. A. (2008b). On regression adjustments in experiments with several treatments. Ann. Appl. Statist. 2 176–196.
Frey, B. S. and Meier, S. (2004). Social comparisons and pro-social behavior: Testing “conditional cooperation” in a field experiment. American Economic Review 94 1717–1722.
Gail, M. H. (1986). Adjusting for covariates that have the same distribution in exposed and unexposed cohorts. In Modern Statistical Methods in Chronic Disease Epidemiology (S. H. Moolgavkar and R. L. Prentice, eds.) 3–18. Wiley, New York.
Gail, M. H. (1988). The effect of pooling across strata in perfectly balanced studies. Biometrics 44 151–162.
Gertler, P. (2004). Do conditional cash transfers improve child health? Evidence from PROGRESA’s control randomized experiment. American Economic Review 94 336–341.
Gilens, M. (2001). Political ignorance and collective policy preferences. American Political Science Review 95 379–396.
Guo, G. H. and Geng, Z. (1995). Collapsibility of logistic regression coefficients. J. Roy. Statist. Soc. Ser. B 57 263–267.
Mathematical Reviews (MathSciNet): MR1325390
Heckman, J. J. (2000). Causal parameters and policy analysis in economics: A twentieth century retrospective. Quarterly J. Economics 115 45–97.
Hill, A. B. (1961). Principles of Medical Statistics, 7th ed. The Lancet, London.
Hodges, J. L. and Lehmann, E. (1964). Basic Concepts of Probability and Statistics. Holden-Day, San Francisco.
Mathematical Reviews (MathSciNet): MR185709
Hoeffding, H. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
Mathematical Reviews (MathSciNet): MR144363
Digital Object Identifier: doi:10.2307/2282952
Holland, P. W. (1986). Statistics and causal inference (with discussion). J. Amer. Statist. Assoc. 8 945–970.
Mathematical Reviews (MathSciNet): MR867618
Digital Object Identifier: doi:10.2307/2289064
Hu, W.-Y. (2003). Marriage and economic incentives: Evidence from a welfare experiment. J. Human Resources 38 942–963.
Koch, C. G. and Gillings, D. B. (2005). Inference, design-based vs. model-based. In Encyclopedia of Statistical Sciences (S. Kotz, C. B. Read, N. Balakrishnan and B. Vidakovic, eds.), 2nd ed. Wiley, Hoboken, NJ.
Lane, P. W. and Nelder, J. A. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics 38 613–621.
Lim, W. (1999). Estimating impacts on binary outcomes under random assignment. Unpublished technical note, MDRC, New York.
Middleton, J. (2007). Even for randomized experiments, logistic regression is not generally consistent. Unpublished technical note, Political Science Dept., Yale Univ.
Netto, E. (1927). Lehrbuch der Combinatorik. Teubner, Leipzig.
Neyman, J. (1923). Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych 10 1–51. (In Polish.) English translation by D. M. Dabrowska and T. P. Speed (1990) Statist. Sci. 5 465–480 (with discussion).
Neyman, J., Kolodziejczyk, S. and Iwaszkiewicz, K. (1935). Statistical problems in agricultural experimentation. J. Roy. Statist. Soc. 2 Supplement 107–154.
Pate, A. M. and Hamilton, E. E. (1992). Formal and informal deterrents to domestic violence: The Dade county spouse assault experiment. American Sociological Review 57 691–697.
Pratt, J. W. (1981). Concavity of the log likelihood. J. Amer. Statist. Assoc. 76 103–106.
Mathematical Reviews (MathSciNet): MR608179
Digital Object Identifier: doi:10.2307/2287052
Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese 121 151–179.
Mathematical Reviews (MathSciNet): MR1766776
Digital Object Identifier: doi:10.1023/A:1005285815569
Robinson, L. D. and Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. Internat. Statist. Rev. 58 227–240.
Rosenbaum, P. R. (2002). Covariance adjustment in randomized experiments and observational studies (with discussion). Statist. Sci. 17 286–327.
Mathematical Reviews (MathSciNet): MR1962487
Digital Object Identifier: doi:10.1214/ss/1042727942
Project Euclid: euclid.ss/1042727942
Rosenblum, M. and van der Laan, M. J. (2008). Using regression models to analyze randomized trials: Asymptotically valid hypothesis tests despite incorrectly specified models. Available at http://www.bepress.com/ucbbiostat/paper219/.
Scheffé, H. (1956). Alternative models for the analysis of variance. Ann. Math. Statist. 27 251–271.
Mathematical Reviews (MathSciNet): MR82249
Digital Object Identifier: doi:10.1214/aoms/1177728258
Project Euclid: euclid.aoms/1177728258
Tauber, S. (1963). On multinomial coefficients. Amer. Math. Monthly 70 1058–1063.
Mathematical Reviews (MathSciNet): MR160735
Digital Object Identifier: doi:10.2307/2312833
Tropfke, J. (1903). Geschichte der Elementar-mathematik in systematischer Darstellung. Verlag Von Veit & Comp, Leipzig.
Truett, J., Cornfield, J. and Kannel, W. (1967). A multivariate analysis of the risk of coronary heart disease in Framingham. J. Chronic Diseases 20 511–524.
Verhulst, P. F. (1845). Recherches mathématiques sur la loi d’accroissement de la population. Nouveaux mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles 18 1–38.
Yule, G. U. (1925). The growth of population and the factors which control it (with discussion). J. Roy. Statist. Soc. 88 1–62.

2009 © Institute of Mathematical Statistics