Bayesian Analysis

Learning Semiparametric Regression with Missing Covariates Using Gaussian Process Models

Abhishek Bishoyi, Xiaojing Wang, and Dipak K. Dey

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


Missing data often appear as a practical problem while applying classical models in the statistical analysis. In this paper, we consider a semiparametric regression model in the presence of missing covariates for nonparametric components under a Bayesian framework. As it is known that Gaussian processes are a popular tool in nonparametric regression because of their flexibility and the fact that much of the ensuing computation is parametric Gaussian computation. However, in the absence of covariates, the most frequently used covariance functions of a Gaussian process will not be well defined. We propose an imputation method to solve this issue and perform our analysis using Bayesian inference, where we specify the objective priors on the parameters of Gaussian process models. Several simulations are conducted to illustrate effectiveness of our proposed method and further, our method is exemplified via two real datasets, one through Langmuir equation, commonly used in pharmacokinetic models, and another through Auto-mpg data taken from the StatLib library.

Article information

Bayesian Anal., Advance publication (2018), 25 pages.

First available in Project Euclid: 9 April 2019

Permanent link to this document

Digital Object Identifier

Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Gaussian processes missing at random missing covariates nonparametric regression semiparametric regression

Creative Commons Attribution 4.0 International License.


Bishoyi, Abhishek; Wang, Xiaojing; Dey, Dipak K. Learning Semiparametric Regression with Missing Covariates Using Gaussian Process Models. Bayesian Anal., advance publication, 9 April 2019. doi:10.1214/18-BA1136.

Export citation


  • Adler, R. J. (1990). “An introduction to continuity, extrema, and related topics for general Gaussian processes.” Lecture Notes-Monograph Series, 12: i–155.
  • Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. University of London, London.
  • Berger, J. O., Oliveira, V. D., and Sansó, B. (2001). “Objective Bayesian Analysis of Spatially Correlated Data.” Journal of the American Statistical Association, 96(456): 1361–1374.
  • Bishoyi, A., Wang, X., and Dey, D. K. (2019). “Supplementary Materials for “Learning Semiparametric Regression with Missing Covariates Using Gaussian Process Models”.” Bayesian Analysis.
  • Brooks, S. P. and Gelman, A. (1998). “General methods for monitoring convergence of iterative simulations.” Journal of computational and graphical statistics, 7(4): 434–455.
  • Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M., et al. (2006). “Deviance information criteria for missing data models.” Bayesian analysis, 1(4): 651–673.
  • Choi, T. and Schervish, M. J. (2007). “On posterior consistency in nonparametric regression problems.” Journal of Multivariate Analysis, 98(10): 1969–1987.
  • Cramér, H. and Leadbetter, M. R. (2013). Stationary and related stochastic processes: Sample function properties and their applications. Courier Corporation.
  • Damianou, A. and Lawrence, N. D. (2015). “Semi-described and semi-supervised learning with Gaussian processes.” arXiv preprint arXiv:1509.01168.
  • Denison, D. G. (2002). Bayesian methods for nonlinear classification and regression, volume 386. John Wiley & Sons.
  • Dey, D. K., Chen, M.-H., and Chang, H. (1997). “Bayesian Approach for Nonlinear Random Effects Models.” Biometrics, 53(4): 1239–1252.
  • Engle, R. F., Granger, C. W. J., Rice, J., and Weiss, A. (1986). “Semiparametric Estimates of the Relation Between Weather and Electricity Sales.” Journal of the American Statistical Association, 81(394): 310–320.
  • Faes, C., Ormerod, J. T., and Wand, M. P. (2011). “Variational Bayesian Inference for Parametric and Nonparametric Regression With Missing Data.” Journal of the American Statistical Association, 106(495): 959–971.
  • Girard, A. and Murray-Smith, R. (2003). “Learning a Gaussian process model with uncertain inputs.” Technical report, Department of Computing Science, University of Glasgow.
  • Härdle, W. and Liang, H. (2007). Partially Linear Models, 87–103. Berlin, Heidelberg: Springer, Berlin Heidelberg.
  • Langmuir, I. (1918). “The Adsorption of Gases on Plane Surfaces of Glass, Mica and Platinum.” Journal of the American Chemical Society, 40(9): 1361–1403.
  • Liao, X., Li, H., and Carin, L. (2007). “Quadratically gated mixture of experts for incomplete data classification.” In Proceedings of the 24th International Conference on Machine learning, 553–560. ACM.
  • Little, R. J. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. John Wiley & Sons.
  • Mahle, J. J., Buettner, L. C., and Friday, D. K. (1994). “Measurement and correlation of the adsorption equilibria of refrigerant vapors on activated carbon.” Industrial & Engineering Chemistry Research, 33(2): 346–354.
  • Neal, R. M. (2003). “Slice Sampling.” The Annals of Statistics, 31(3): 705–741.
  • Quiñonero-Candela, J. and Roweis, S. T. (2003). “Data imputation and robust training with Gaussian processes.” Technical report, Citeseer.
  • Rasmussen, C. E. and Williams, C. K. (2006). Gaussian processes for machine learning. The MIT Press.
  • Ren, C., Sun, D., and He, C. (2012). “Objective Bayesian analysis for a spatial model with nugget effects.” Journal of Statistical Planning and Inference, 142(7): 1933–1946.
  • Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression. 12. Cambridge University Press.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4): 583–639.
  • Takezawa, K. (2005). Introduction to nonparametric regression, volume 606. John Wiley & Sons.
  • van Buuren, S. and Groothuis-Oudshoorn, K. (2011). “mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software, 45(3): 1–67.
  • Van Der Vaart, A. W. and Wellner, J. A. (1996). “Weak Convergence.” In Weak Convergence and Empirical Processes, 16–28. Springer.
  • Wang, C., Liao, X., Carin, L., and Dunson, D. B. (2010). “Classification with incomplete data using Dirichlet process priors.” Journal of Machine Learning Research, 11(Dec): 3269–3311.
  • Yau, P. and Kohn, R. (2003). “Estimation and variable selection in nonparametric heteroscedastic regression.” Statistics and Computing, 13(3): 191–208.
  • Zhang, X., Song, S., Zhu, L., You, K., and Wu, C. (2016). “Unsupervised learning of Dirichlet process mixture models with missing data.” Science China Information Sciences, 59(1): 1–14.

Supplemental materials

  • Supplementary Materials for “Learning Semiparametric Regression with Missing Covariates Using Gaussian Process Models”. We have restated about the four conditions used in Ren et al. (2012) and the derivation for the Conditional Distribution of $\mathbf{x}^{mis}$ Given $\mathbf{x}^{obs}$ in Section S.1 and Section S.2 of the supplement, respectively. Moreover, we have put the detailed results of MSEx, PMSE and DIC for different covariance kernels in Simulation II of Section 4.2 as Section S.3 of the supplement material. Also, in Section S.4 and Section S.5 of the supplement material, we have included the MCMC sampling scheme for Langmuir model estimation as well as the MCMC sampling scheme for Log Model Estimation for Section 5.2. See more details in Supplement S (