• Bernoulli
  • Volume 24, Number 2 (2018), 1531-1575.

Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case

François Bachoc

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In parametric estimation of covariance function of Gaussian processes, it is often the case that the true covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been shown that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihood. Motivated by this observation, we provide a general asymptotic analysis of the misspecified case, for independent and uniformly distributed observation points. We prove that the Maximum Likelihood estimator asymptotically minimizes a Kullback–Leibler divergence, within the misspecified parametric set, while Cross Validation asymptotically minimizes the integrated square prediction error. In Monte Carlo simulations, we show that the covariance parameters estimated by Maximum Likelihood and Cross Validation, and the corresponding Kullback–Leibler divergences and integrated square prediction errors, can be strongly contrasting. On a more technical level, we provide new increasing-domain asymptotic results for independent and uniformly distributed observation points.

Article information

Bernoulli, Volume 24, Number 2 (2018), 1531-1575.

Received: November 2015
Revised: June 2016
First available in Project Euclid: 21 September 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

covariance parameter estimation cross validation Gaussian processes increasing-domain asymptotics integrated square prediction error Kullback–Leibler divergence maximum likelihood


Bachoc, François. Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case. Bernoulli 24 (2018), no. 2, 1531--1575. doi:10.3150/16-BEJ906.

Export citation


  • [1] Abrahamsen, P. (1997). A review of Gaussian random fields and correlation functions. Technical report, Norwegian computing center.
  • [2] Anderes, E. (2010). On the consistent separation of scale and variance for Gaussian random fields. Ann. Statist. 38 870–893.
  • [3] Andrianakis, I. and Challenor, P.G. (2012). The effect of the nugget on Gaussian process emulators of computer models. Comput. Statist. Data Anal. 56 4215–4228.
  • [4] Azencott, R. and Dacunha-Castelle, D. (1986). Series of Irregular Observations: Forecasting and Model Building. Applied Probability. A Series of the Applied Probability Trust. New York: Springer.
  • [5] Bachoc, F. (2013). Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Comput. Statist. Data Anal. 66 55–69.
  • [6] Bachoc, F. (2013). Parametric estimation of covariance function in Gaussian-process based Kriging models. Application to uncertainty quantification for computer experiments. Ph.D. thesis, Université Paris-Diderot – Paris VII. Available at
  • [7] Bachoc, F. (2014). Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes. J. Multivariate Anal. 125 1–35.
  • [8] Bachoc, F. (2016). Supplement to “Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case.” DOI:10.3150/16-BEJ906SUPP.
  • [9] Bachoc, F., Bois, G., Garnier, J. and Martinez, J.M. (2014). Calibration and improved prediction of computer models by universal Kriging. Nucl. Sci. Eng. 176 81–97.
  • [10] Chevalier, C., Ginsbourger, D., Bect, J., Vazquez, E., Picheny, V. and Richet, Y. (2014). Fast parallel Kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56 455–465.
  • [11] Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. J. Statist. Plann. Inference 140 640–651.
  • [12] Cressie, N. and Lahiri, S.N. (1993). The asymptotic distribution of REML estimators. J. Multivariate Anal. 45 217–233.
  • [13] Cressie, N. and Lahiri, S.N. (1996). Asymptotics for REML estimation of spatial covariance parameters. J. Statist. Plann. Inference 50 327–341.
  • [14] Dubrule, O. (1983). Cross validation of Kriging in a unique neighborhood. J. Int. Assoc. Math. Geol. 15 687–699.
  • [15] Furrer, R., Bachoc, F. and Du, J. (2016). Asymptotic properties of multivariate tapering for estimation and prediction. J. Multivariate Anal. 149 177–191.
  • [16] Furrer, R., Genton, M.G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
  • [17] Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106 746–762.
  • [18] Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • [19] Gramacy, R.B. and Apley, D.W. (2015). Local Gaussian process approximation for large computer experiments. J. Comput. Graph. Statist. 24 561–578.
  • [20] Gray, R.M. (2006). Toeplitz and circulant matrices: A review. Found. Trends Commun. Inf. Theory 2 155–239.
  • [21] Hallin, M., Lu, Z. and Yu, K. (2009). Local linear spatial quantile regression. Bernoulli 15 659–686.
  • [22] Handcock, M.S. and Wallis, J.R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–390.
  • [23] Iooss, B., Boussouf, L., Feuillard, V. and Marrel, A. (2010). Numerical studies of the metamodel fitting and validation processes. International Journal of Advances in Systems and Measurements 3 11–21.
  • [24] Jones, D.R., Schonlau, M. and Welch, W.J. (1998). Efficient global optimization of expensive black-box functions. J. Global Optim. 13 455–492.
  • [25] Kou, S.C. (2003). On the efficiency of selection criteria in spline regression. Probab. Theory Related Fields 127 153–176.
  • [26] Lahiri, S.N. (2003). Central limit theorems for weighted sums of a spatial process under a class of stochastic and fixed designs. Sankhyā 65 356–388.
  • [27] Lahiri, S.N. and Mukherjee, K. (2004). Asymptotic distributions of $M$-estimators in a spatial regression model under some fixed and stochastic spatial sampling designs. Ann. Inst. Statist. Math. 56 225–250.
  • [28] Lahiri, S.N. and Robinson, P.M. (2016). Central limit theorems for long range dependent spatial linear processes. Bernoulli 22 345–375.
  • [29] Lahiri, S.N. and Zhu, J. (2006). Resampling methods for spatial regression models under a class of stochastic designs. Ann. Statist. 34 1774–1813.
  • [30] Le Gratiet, L. and Garnier, J. (2014). Asymptotic analysis of the learning curve for Gaussian process regression. Mach. Learn. 1–27.
  • [31] Loh, W.-L. (2005). Fixed-domain asymptotics for a subclass of Matérn-type Gaussian random fields. Ann. Statist. 33 2344–2394.
  • [32] Mardia, K.V. and Marshall, R.J. (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71 135–146.
  • [33] Marrel, A., Iooss, B., Van Dorpe, F. and Volkova, E. (2008). An efficient methodology for modeling complex computer codes with Gaussian processes. Comput. Statist. Data Anal. 52 4731–4744.
  • [34] Martin, J.D. and Simpson, T.W. (2004). On the use of Kriging models to approximate deterministic computer models. In DETC’04 ASME 2004 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference Salt Lake City, Utah USA, September 28 – October 2, 2004.
  • [35] Paulo, R., García-Donato, G. and Palomo, J. (2012). Calibration of computer models with multivariate output. Comput. Statist. Data Anal. 56 3959–3974.
  • [36] Putter, H. and Young, G.A. (2001). On the effect of covariance function estimation on the accuracy of Kriging predictors. Bernoulli 7 421–438.
  • [37] Rasmussen, C.E. and Williams, C.K.I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press.
  • [38] Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
  • [39] Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–423.
  • [40] Santner, T.J., Williams, B.J. and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics. New York: Springer.
  • [41] Scheuerer, M. (2010). Regularity of the sample paths of a general second order random field. Stochastic Process. Appl. 120 1879–1897.
  • [42] Shi, T., Belkin, M. and Yu, B. (2009). Data spectroscopy: Eigenspaces of convolution operators and clustering. Ann. Statist. 37 3960–3984.
  • [43] Stein, M. (1990). Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Statist. 18 850–872.
  • [44] Stein, M.L. (1988). Asymptotically efficient prediction of a random field with a misspecified covariance function. Ann. Statist. 16 55–63.
  • [45] Stein, M.L. (1990). Bounds on the efficiency of linear predictions using an incorrect covariance function. Ann. Statist. 18 1116–1138.
  • [46] Stein, M.L. (1990). A comparison of generalized cross validation and modified maximum likelihood for estimating the parameters of a stochastic process. Ann. Statist. 18 1139–1157.
  • [47] Stein, M.L. (1993). Spline smoothing with an estimated order parameter. Ann. Statist. 21 1522–1544.
  • [48] Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics. New York: Springer.
  • [49] Sundararajan, S. and Keerthi, S.S. (2001). Predictive approaches for choosing hyperparameters in Gaussian processes. Neural Comput. 13 1103–1118.
  • [50] Wendland, H. (1995). Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4 389–396.
  • [51] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25.
  • [52] Ying, Z. (1991). Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process. J. Multivariate Anal. 36 280–296.
  • [53] Ying, Z. (1993). Maximum likelihood estimation of parameters under a spatial sampling scheme. Ann. Statist. 21 1567–1590.
  • [54] Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99 250–261.
  • [55] Zhang, H. and Wang, Y. (2010). Kriging and cross-validation for massive spatial data. Environmetrics 21 290–304.
  • [56] Zhang, H. and Zimmerman, D.L. (2005). Towards reconciling two asymptotic frameworks in spatial statistics. Biometrika 92 921–936.

Supplemental materials

  • Figures and proof of the technical results. In the supplementary material [8], we provide Figures 1 and 2, complementing the one-dimensional illustrative Monte Carlo simulation. We also give the proof of the lemmas stated in Section A.6.