Estimating a difference of Kullback–Leibler risks using a normalized difference of AIC



The Annals of Applied Statistics
previous :: next

Estimating a difference of Kullback–Leibler risks using a normalized difference of AIC

D. Commenges, A. Sayyareh, L. Letenneur, J. Guedj, and A. Bar-Hen

Source: Ann. Appl. Stat. Volume 2, Number 3 (2008), 1123-1142.

Abstract

AIC is commonly used for model selection but the precise value of AIC has no direct interpretation. We are interested in quantifying a difference of risks between two models. This may be useful for both an explanatory point of view or for prediction, where a simpler model may be preferred if it does nearly as well as a more complex model. The difference of risks can be interpreted by linking the risks with relative errors in the computation of probabilities and looking at the values obtained for simple models. A scale of values going from negligible to large is proposed. We propose a normalization of a difference of Akaike criteria for estimating the difference of expected Kullback–Leibler risks between maximum likelihood estimators of the distribution in two different models. The variability of this statistic can be estimated. Thus, an interval can be constructed which contains the true difference of expected Kullback–Leibler risks with a pre-specified probability. A simulation study shows that the method works and it is illustrated on two examples. The first is a study of the relationship between body-mass index and depression in elderly people. The second is the choice between models of HIV dynamics, where one model makes the distinction between activated CD4+ T lymphocytes and the other does not.

Keywords: Akaike criterion; body-mass index; depression; HIV dynamics; Kullback–Leibler; logistic regression; model choice

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
This document is available for purchase at a cost of $15. Select the "buy article" button below to make a credit card purchase of this document through a secure payment site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1223908055
Digital Object Identifier: doi:10.1214/08-AOAS176

References

Akaike, H. (1973). Information theory and an extension of maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Akadémiai Kiadó, Budapest.
Bergdahl, E., Allard, P., Lundman, B. and Gustafson, Y. (2007). Depression in the oldest old in urban and rural municipalities. Aging and Mental Health 5 570–578.
Bjerkeset, O., Romundstad, P., Evans, J. and Gunnell, D. (2008). Association of adult body mass index and height with anxiety, depression, and suicide in the general population: The HUNT Study. Am. J. Epidemiol. 167 193–202.
Bortz, D. M. and Nelson, P. W. (2006). Model selection and mixed-effects modeling of HIV infection dynamics. Bull. Math. Biol. 68 2005–2025.
Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. J. Math. Psych. 44 62–91.
Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. Springer, New York.
Commenges, D., Joly, P, Gégout-Petit, A. and Liquet, B. (2007). Choice between semi-parametric estimators of Markov and non-Markov multi-state models from generally coarsened observations. Scand. J. Statistics 34 33–52.
De Boer, R. and Perelson, A. S. (1998). Target cell limited and immune control models of HIV infection: A comparison. J. Theor. Biol. 190 201–214.
deLeuwe, J. (1992). Introduction to Akaike (1973) information theory and an extension of the maximum likelihood principle. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 599–609. Springer, New York.
Evans, M., Hastings, N. and Peacock, B. (1993). Statistical Distributions, 2nd ed. Wiley, New York.
Guedj, J., Thiébaut, R. and Commenges, D. (2007). Maximum likelihood estimation in dynamical models of HIV. Biometrics 63 1198–1206.
Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879–899.
Ho, D. D., Neumann, A. U., Perelson, A. S., Chen, W., Leonard, J. M. and Markowitz, M. (1995). Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373 123–126.
Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 332–417.
Kendall, M. G. and Stuart A. (1973). The Advanced Theory of Statistics. Griffin, London.
Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika 83 875–890.
Kullback, S. (1968). Information Theory and Statistics. Dover, New York.
Letenneur, L., Gilleron, V., Commenges, D., Helmer, C., Orgogozo, J. M. and Dartigues, J. F. (1999). Are sex and educational level independent predictors of dementia and Alzheimer’s disease? Incidence data from the PAQUID project. J. Neurology Neurosurgery and Psychiatry 66 177–183.
Linhart, H. and Zucchini, W. (1986). Model Selection. Wiley, New York.
Liquet, B., Sakarovitch, C. and Commenges, D. (2003). Bootstrap choice of estimators in parametric and semi-parametric families: An extension of EIC. Biometrics 59 172–178.
Molina, J., Chêne, G., Ferchal, F., Journot, V., Pellegrin, I., Sombardier, M. N., Rancinan, C., Cotte, L., Madelaine, I., Debord, T. and Decazes, J. M. (1999). The ALBI Trial: A randomized controlled trial comparing stavudine plus didanosine with zidovudine plus lamivudine and a regimen alternating both combinations in previously untreated patients infected with human immunodeficiency virus. J. Infectious Diseases 180 351–358.
Perelson, A. S., Neuman, A. U., Markowitch, M., Leonard, J. M. and Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271 1582–1586.
Putter, H., Heisterkamp, S. H., Lange, J. M. A. and de Wolf, F. (2002). A Bayesian approach to parameter estimation in HIV dynamic models. Stat. Med. 21 2199–2214.
Shen, X. and Huang, H.-C. (2006). Optimal model assessment, selection and combination. J. Am. Statist. Assoc. 101 554–568.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Shibata, R. (1997). Bootstrap estimate of Kullback–Leibler information for model selection. Statist. Sinica 7 375–394.
Shimodaira, H. (2001). Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection. Comm. Statist. Theory Methods 30 1751, 1772.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica 57 307–333.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. 54 426–482.
previous :: next

2009 © Institute of Mathematical Statistics