The Annals of Applied Statistics

Parallel partial Gaussian process emulation for computer models with massive output

Mengyang Gu and James O. Berger

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider the problem of emulating (approximating) computer models (simulators) that produce massive output. The specific simulator we study is a computer model of volcanic pyroclastic flow, a single run of which produces up to $10^{9}$ outputs over a space–time grid of coordinates. An emulator (essentially a statistical model of the simulator—we use a Gaussian Process) that is computationally suitable for such massive output is developed and studied from practical and theoretical perspectives. On the practical side, the emulator does unexpectedly well in predicting what the simulator would produce, even better than much more flexible and computationally intensive alternatives. This allows the attainment of the scientific goal of this work, accurate assessment of the hazards from pyroclastic flows over wide spatial domains. Theoretical results are also developed that provide insight into the unexpected success of the massive emulator. Generalizations of the emulator are introduced that allow for a nugget, which is useful for the application to hazard assessment.

Article information

Source
Ann. Appl. Stat., Volume 10, Number 3 (2016), 1317-1347.

Dates
Received: January 2015
Revised: April 2016
First available in Project Euclid: 28 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1475069609

Digital Object Identifier
doi:10.1214/16-AOAS934

Mathematical Reviews number (MathSciNet)
MR3553226

Zentralblatt MATH identifier
06775268

Keywords
Gaussian process computer model emulation space–time coordinate objective Bayesian analysis

Citation

Gu, Mengyang; Berger, James O. Parallel partial Gaussian process emulation for computer models with massive output. Ann. Appl. Stat. 10 (2016), no. 3, 1317--1347. doi:10.1214/16-AOAS934. https://projecteuclid.org/euclid.aoas/1475069609


Export citation

References

  • Andrianakis, I. and Challenor, P. G. (2012). The effect of the nugget on Gaussian process emulators of computer models. Comput. Statist. Data Anal. 56 4215–4228.
  • Bastos, L. S. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics 51 425–438.
  • Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.-H. and Tu, J. (2007a). A framework for validation of computer models. Technometrics 49 138–154.
  • Bayarri, M. J., Berger, J. O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., Parthasarathy, R. J., Paulo, R., Sacks, J. and Walsh, D. (2007b). Computer model validation with functional output. Ann. Statist. 35 1874–1906.
  • Bayarri, M. J., Berger, J. O., Calder, E. S., Dalbey, K., Lunagomez, S., Patra, A. K., Pitman, E. B., Spiller, E. T. and Wolpert, R. L. (2009). Using statistical and computer models to quantify volcanic hazards. Technometrics 51 402–413.
  • Bayarri, M. J., Berger, J. O., Calder, E. S., Patra, A. K., Pitman, E. B., Spiller, E. T. and Wolpert, R. L. (2015). Probabilistic quantification of hazards: A methodology using small ensembles of physics-based simulations and statistical surrogates. Int. J. Uncertain. Quantif. 5 297–325.
  • Berger, J. O., De Oliveira, V. and Sansó, B. (2001). Objective Bayesian analysis of spatially correlated data. J. Amer. Statist. Assoc. 96 1361–1374.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. J. Statist. Plann. Inference 140 640–651.
  • Cox, D. R. (1975). Partial likelihood. Biometrika 62 269–276.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
  • Forrester, A., Sobester, A. and Keane, A. (2008). Engineering Design Via Surrogate Modelling: A Practical Guide. Wiley, New York.
  • Fricker, T. E., Oakley, J. E. and Urban, N. M. (2013). Multivariate Gaussian process emulators with nonseparable covariance structures. Technometrics 55 47–56.
  • Gelfand, A. E., Diggle, P. J., Fuentes, M. and Guttorp, P., eds. (2010). Handbook of Spatial Statistics. CRC Press, Boca Raton, FL.
  • Gu, M. (2016). Robust uncertainty quantification and scalable computation for computer models with massive output. Ph.D. thesis, Duke Univ.
  • Gu, M. and Berger, J. O. (2016). Supplement to “Parallel partial Gaussian process emulation for computer models with massive output.” DOI:10.1214/16-AOAS934SUPP.
  • Gupta, A. K. and Nagar, D. K. (1999). Matrix Variate Distributions. CRC Press, Boca Raton.
  • Higdon, D., Gattiker, J., Williams, B. and Rightley, M. (2008). Computer model calibration using high-dimensional output. J. Amer. Statist. Assoc. 103 570–583.
  • Iooss, B. and Lemaître, P. (2014). A review on global sensitivity analysis methods. Preprint. Available at arXiv:1404.2405.
  • Kaufman, C. G., Schervish, M. J. and Nychka, D. W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Amer. Statist. Assoc. 103 1545–1555.
  • Kaufman, C. G., Bingham, D., Habib, S., Heitmann, K. and Frieman, J. A. (2011). Efficient emulators of computer experiments using compactly supported correlation functions, with an application to cosmology. Ann. Appl. Stat. 5 2470–2492.
  • Kazianka, H. and Pilz, J. (2012). Objective Bayesian analysis of spatial data with uncertain nugget and range parameters. Canad. J. Statist. 40 304–327.
  • Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 425–464.
  • Kennedy, M., Anderson, C., O’Hagan, A., Lomas, M., Woodward, I., Gosling, J. P. and Heinemeyer, A. (2008). Quantifying uncertainty in the biospheric carbon flux for England and Wales. J. Roy. Statist. Soc. Ser. A 171 109–135.
  • Lee, L. A., Carslaw, K. S., Pringle, K. J., Mann, G. W. and Spracklen, D. V. (2011). Emulation of a complex global aerosol model to quantify sensitivity to uncertain parameters. Atmos. Chem. Phys. 11 12253–12273.
  • Lee, L. A., Carslaw, K. S., Pringle, K. J. and Mann, G. W. (2012). Mapping the uncertainty in global CCN using emulation. Atmospheric Chemistry and Physics 12 9739–9751.
  • Li, R. and Sudjianto, A. (2005). Analysis of computer experiments using penalized likelihood in Gaussian kriging models. Technometrics 47 111–120.
  • Lindsay, B. G. (1988). Composite likelihood methods. In Statistical Inference from Stochastic Processes (Ithaca, NY, 1987). Contemp. Math. 80 221–239. Amer. Math. Soc., Providence, RI.
  • Lindsay, B. G., Yi, G. Y. and Sun, J. (2011). Issues and strategies in the selection of composite likelihoods. Statist. Sinica 21 71–105.
  • Linkletter, C., Bingham, D., Hengartner, N., Higdon, D. and Ye, K. Q. (2006). Variable selection for Gaussian process models in computer experiments. Technometrics 48 478–490.
  • Lopes, D. (2011). Development and implementation of Bayesian computer model emulators. Ph.D. thesis, Duke Univ.
  • Marrel, A., Iooss, B., Jullien, M., Laurent, B. and Volkova, E. (2011). Global sensitivity analysis for models with spatially dependent outputs. Environmetrics 22 383–397.
  • Patra, A. K., Bauer, A. C., Nichita, C. C., Pitman, E. B., Sheridan, M. F., Bursik, M., Rupp, B., Webber, A., Stinton, A. J., Namikawa, L. M. et al. (2005). Parallel adaptive numerical simulation of dry avalanches over natural terrain. J. Volcanol. Geotherm. Res. 139 1–21.
  • Paulo, R. (2005). Default priors for Gaussian processes. Ann. Statist. 33 556–582.
  • Paulo, R., García-Donato, G. and Palomo, J. (2012). Calibration of computer models with multivariate output. Comput. Statist. Data Anal. 56 3959–3974.
  • Pitman, E. B., Nichita, C. C., Patra, A., Bauer, A., Sheridan, M. and Bursik, M. (2003). Computing granular avalanches and landslides. Phys. Fluids 15 3638–3646.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA.
  • Ren, C., Sun, D. and He, C. (2012). Objective Bayesian analysis for a spatial model with nugget effects. J. Statist. Plann. Inference 142 1933–1946.
  • Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. J. Comput. Graph. Statist. 17 827–843.
  • Rougier, J., Guillas, S., Maute, A. and Richmond, A. D. (2009). Expert knowledge and multivariate emulation: The thermosphere-ionosphere electrodynamics general circulation model (TIE-GCM). Technometrics 51 414–424.
  • Roustant, O., Ginsbourger, D. and Deville, Y. (2012). DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. J. Stat. Softw. 51 1–55.
  • Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–435.
  • Savitsky, T., Vannucci, M. and Sha, N. (2011). Variable selection for nonparametric Gaussian process priors: Models and computational strategies. Statist. Sci. 26 130–149.
  • Severini, T. A. (2000). Likelihood Methods in Statistics. Oxford Statistical Science Series 22. Oxford Univ. Press, Oxford.
  • Spiller, E. T., Bayarri, M. J., Berger, J. O., Calder, E. S., Patra, A. K., Pitman, E. B. and Wolpert, R. L. (2014). Automating emulator construction for geophysical hazard maps. SIAM/ASA J. Uncertain. Quantificat. 2 126–152.
  • Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42.
  • Xiao, M., Breitkopf, P., Filomeno Coelho, R., Knopf-Lenoir, C., Sidorkiewicz, M. and Villon, P. (2010). Model reduction by CPOD and Kriging: Application to the shape optimization of an intake port. Struct. Multidiscip. Optim. 41 555–574.
  • Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99 250–261.

Supplemental materials

  • Supplement to “Parallel partial Gaussian process emulation for computer models with massive output”. This supplement consists of three parts. The first part describes the “periodic folding” method for modeling the correlation between periodic inputs. The second part provides some numerical results that the PP GaSP emulator with a nugget is close to being an interpolator for the TITAN2D computer model. Part 3 discusses a prior for smoothing the draws of the PP GaSP emulator through block sampling.