The Annals of Applied Statistics

Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: A dengue case study

Leah R. Johnson, Robert B. Gramacy, Jeremy Cohen, Erin Mordecai, Courtney Murdock, Jason Rohr, Sadie J. Ryan, Anna M. Stewart-Ibarra, and Daniel Weikel

Full-text: Open access


In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week, and total season incidence across each of several seasons. Our team was one of the winners of that competition, outperforming other teams in multiple targets/locales. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large—for example, relationships between dengue transmission and environmental factors—and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that “memorize” the trajectories of past seasons, and then “match” the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, or where measurements and forecasts of ancillary covariates like precipitation are unavailable, and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.

Article information

Ann. Appl. Stat., Volume 12, Number 1 (2018), 27-66.

Received: May 2017
Revised: August 2017
First available in Project Euclid: 9 March 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Epidemiology Gaussian process heteroskedastic modeling latent variable generalized linear (autoregressive) model dengue fever


Johnson, Leah R.; Gramacy, Robert B.; Cohen, Jeremy; Mordecai, Erin; Murdock, Courtney; Rohr, Jason; Ryan, Sadie J.; Stewart-Ibarra, Anna M.; Weikel, Daniel. Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: A dengue case study. Ann. Appl. Stat. 12 (2018), no. 1, 27--66. doi:10.1214/17-AOAS1090.

Export citation


  • Ankenman, B., Nelson, B. L. and Staum, J. (2010). Stochastic kriging for simulation metamodeling. Oper. Res. 58 371–382.
  • Barrera, R., Amador, M. and MacKay, A. J. (2011). Population dynamics of Aedes aegypti and dengue as influenced by weather and human behavior in San Juan, Puerto Rico. PLoS Negl. Trop. Dis. 5 e1378.
  • Binois, M., Gramacy, R. B. and Ludkovski, M. (2016). Practical heteroskedastic Gaussian process modeling for large simulation experiments. arXiv preprint, arXiv:1611.05902.
  • Bornn, L., Shaddick, G. and Zidek, J. V. (2012). Modeling nonstationary processes through dimension expansion. J. Amer. Statist. Assoc. 107 281–289.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. Revised reprint of the 1991 edition.
  • Degallier, N., Favier, C., Menkes, C., Lengaigne, M., Ramalho, W. M., Souza, R., Servain, J. and Boulanger, J.-P. (2010). Toward an early warning system for dengue prevention: Modeling climate impact on dengue transmission. Clim. Change 98 581–592.
  • Elderd, B. D., Dukic, V. M. and Dwyer, G. (2006). Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases. Proc. Natl. Acad. Sci. USA 103 15693–15697.
  • Farah, M., Birrell, P., Conti, S. and De Angelis, D. (2014). Bayesian emulation and calibration of a dynamic epidemic model for A/H1N1 influenza. J. Amer. Statist. Assoc. 109 1398–1411.
  • Gagnon, A. S., Bush, A. B. and Smoyer-Tomic, K. E. (2001). Dengue epidemics and the El Niño southern oscillation. Clim. Res. 19 35–43.
  • Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106 746–762.
  • Gneiting, T. (2017). When is the mode functional the Bayes classifier? Stat 6 204–206.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Gneiting, T., Larson, K., Westrick, K., Genton, M. G. and Aldrich, E. (2006). Calibrated probabilistic forecasting at the Stateline wind energy center: The regime-switching space-time method. J. Amer. Statist. Assoc. 101 968–979.
  • Gramacy, R. B. (2014). \tt laGP: Local approximate Gaussian process regression. R package version 1.1-4.
  • Gramacy, R. B. (2016). laGP: Large-scale spatial modeling via local approximate Gaussian processes in R. J. Stat. Softw. 72 1–46.
  • Hu, R. and Ludkovsk, M. (2017). Sequential design for ranking response surfaces. SIAM/ASA J. Uncertain. Quantificat. 5 212–239.
  • Johansson, M. A., Cummings, D. A. T. and Glass, G. E. (2009). Multiyear climate variability and dengue–El Niño southern oscillation, weather, and dengue incidence in Puerto Rico, Mexico, and Thailand: A longitudinal data analysis. PLoS Med. 6 e1000168.
  • Johnson, L. R. and Gramacy, R. B. (2017). vbdcast: Vector-borne disease forecasting. Technical report.
  • Johnson, L. R., Ben-Horin, T., Lafferty, K. D., McNally, A., Mordecai, E., Paaijmans, K. P., Pawar, S. and Ryan, S. J. (2015). Understanding uncertainty in temperature effects on vector-borne disease: A Bayesian approach. Ecology 96 203–213.
  • Johnson, L. R., Gramacy, R. B., Cohen, J., Mordecai, E., Murdock, C., Rohr, J., Ryan, S. J., Stewart-Ibarra, A. M. and Weikel, D. (2018). Supplement to “Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: A dengue case study.” DOI:10.1214/17-AOAS1090SUPPA, DOI:10.1214/17-AOAS1090SUPPB.
  • Koepke, A. A., Longini Jr., I. M., Halloran, M. E., Wakefield, J. and Minin, V. N. (2016). Predictive modeling of cholera outbreaks in Bangladesh. Ann. Appl. Stat. 10 575.
  • Kuhn, K., Campbell-Lendrum, D., Haines, A., Cox, J., Corvalán, C., Anker, M. et al. (2005). Using climate to predict infectious disease epidemics. White Paper, World Health Organization, Geneva.
  • Lambrechts, L., Paaijmans, K. P., Fansiri, T., Carrington, L. B., Kramer, L. D., Thomas, M. B. and Scott, T. W. (2011). Impact of daily temperature fluctuations on dengue virus transmission by Aedes aegypti. Proc. Natl. Acad. Sci. USA 108 7460–7465.
  • Ludkovski, M. and Niemi, J. (2010). Optimal dynamic policies for influenza management. Stat. Commun. Infec. Dis. 2 Art. 5, 27.
  • Matheron, G. (1963). Principles of geostatistics. Econ. Geol. 58 1246–1266.
  • Merl, D., Johnson, L. R., Gramacy, R. B. and Mangel, M. (2009). A statistical framework for the adaptive management of epidemiological interventions. PLoS ONE 4 e5807.
  • Moore, C. G., Cline, B. L., Ruiz-Tibén, E., Lee, D., Romney-Joseph, H. and Rivera-Correa, E. (1978). Aedes aegypti in Puerto Rico: Environmental determinants of larval abundance and relation to dengue virus transmission. Am. J. Trop. Med. Hyg. 27 1225–1231.
  • Mordecai, E. A., Paaijmans, K. P., Johnson, L. R., Balzer, C., Ben-Horin, T., de Moor, E., McNally, A., Pawar, S., Ryan, S. J., Smith, T. C. and Lafferty, K. D. (2013). Optimal temperature for malaria transmission is dramatically lower than previously predicted. Ecol. Lett. 16 22–30.
  • Mordecai, E., Cohen, J., Evans, M. V., Gudapati, P., Johnson, L. R., Lippi, C. A., Miazgowicz, K., Murdock, C. C., Rohr, J. R., Ryan, S. J., Savage, V., Shocket, M., Stewart Ibarra, A., Thomas, M. B. and Weikel, D. P. (2017). Detecting the impact of temperature on transmission of Zika, dengue and chikungunya using mechanistic models. PLoS Negl. Trop. Dis. 11 e0005568.
  • Osthus, D., Hickmann, K. S., Caragea, P. C., Higdon, D. and Del Valle, S. Y. (2017). Forecasting seasonal influenza with a state–space SIR model. Ann. Appl. Stat. 11 202–224.
  • R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press.
  • Ray, E. L., Sakrejda, K., Lauer, S. A., Johansson, M. A. and Reich, N. G. (2017). Infectious disease prediction with kernel conditional density estimation. Technical report.
  • Reynolds, R. W., Rayner, N. A., Smith, T. M., Stokes, D. C. and Wang, W. (2002). An improved in situ and satellite SST analysis for climate. J. Climate 15 1609–1625.
  • Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–435. With comments and a rejoinder by the authors.
  • Stewart-Ibarra, A. M. and Lowe, R. (2013). Climate and non-climate drivers of dengue epidemics in southern coastal Ecuador. Am. J. Trop. Med. Hyg. 88 971–981.
  • Stewart-Ibarra, A. M., Ryan, S. J., Beltrán, E., Mejía, R., Silva, M. and Muñoz, Á. (2013). Dengue vector dynamics (Aedes aegypti) influenced by climate and social factors in Ecuador: Implications for targeted control. PLoS ONE 8 e78263.
  • Thomson, M. C., Garcia-Herrera, R. and Beniston, M. (2008). Seasonal Forecasts, Climatic Change and Human Health. Springer.
  • Venables, W. N. and Ripley, B. D. (1994). Modern Applied Statistics with S-Plus. Springer, New York.
  • World Health Organization (2009). Dengue: Guidelines for diagnosis, treatment, prevention and control. Special Programme for Research and Training in Tropical Diseases, Department of Control of Neglected Tropical Diseases, and Epidemic and Pandemic Alert, World Health Organization.
  • World Health Organization (2016). Dengue vaccine: WHO position paper—July 2016. Weekly Epidemiological Record 91 349–364.
  • Xu, L., Stige, L. C., Chan, K.-S., Zhou, J., Yang, J., Sang, S., Wang, M., Yang, Z., Yan, Z., Jiang, T., Lu, L., Yue, Y., Liu, X., Lin, H., Xu, J., Liu, Q. and Stenseth, N. C. (2016). Climate variation drives dengue dynamics. Proc. Natl. Acad. Sci. USA 201618558.
  • Yamana, T. K., Kandula, S. and Shaman, J. (2016). Superensemble forecasts of dengue outbreaks. J. R. Soc. Interface 13 20160410.

Supplemental materials

  • Supplement A: Supplement to: Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study: hetgp San Juan predictions. We provide the full forecasting results for San Juan using the heteroskedastic GP methods.
  • Supplement B: Supplement to: Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study: GLM San Juan predictions. We provide the full forecasting results for San Juan using the GLM model.