Statistical Science

Model-Assisted Survey Estimation with Modern Prediction Techniques

F. Jay Breidt and Jean D. Opsomer

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


This paper reviews the design-based, model-assisted approach to using data from a complex survey together with auxiliary information to estimate finite population parameters. A general recipe for deriving model-assisted estimators is presented and design-based asymptotic analysis for such estimators is reviewed. The recipe allows for a very broad class of prediction methods, with examples from the literature including linear models, linear mixed models, nonparametric regression and machine learning techniques.

Article information

Statist. Sci., Volume 32, Number 2 (2017), 190-205.

First available in Project Euclid: 11 May 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Machine learning nonparametric regression nearest neighbors neural network regression trees survey asymptotics


Breidt, F. Jay; Opsomer, Jean D. Model-Assisted Survey Estimation with Modern Prediction Techniques. Statist. Sci. 32 (2017), no. 2, 190--205. doi:10.1214/16-STS589.

Export citation


  • Aragon, Y., Goga, C. and Ruiz-Gazen, A. (2006). Estimation non-paramétrique de quantiles en présence d’information auxiliaire. In Méthodes D’Enquêtes et Sondages. Pratiques Européenne et Nord-américaine (P. Lavellée and L.-P. Rivest, eds.) 377–382. Dunod, Paris.
  • Baffetta, F., Corona, P. and Fattorini, L. (2010). Design-based diagnostics for k-NN estimators of forest resources. Can. J. For. Res. 41 59–72.
  • Baffetta, F., Fattorini, L., Franceschi, S. and Corona, P. (2009). Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sens. Environ. 113 463–475.
  • Bardsley, P. and Chambers, R. L. (1984). Multipurpose estimation from unbalanced samples. J. R. Stat. Soc. Ser. C. Appl. Stat. 33 290–299.
  • Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 83 28–36.
  • Beaumont, J. F. and Bocci, C. (2008). Another look at ridge calibration. Metron 66 5–20.
  • Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika 100 555–569.
  • Bickel, P. J. and Freedman, D. A. (1984). Asymptotic normality and the bootstrap in stratified sampling. Ann. Statist. 12 470–482.
  • Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51 279–292.
  • Breidt, F. J., Claeskens, G. and Opsomer, J. D. (2005). Model-assisted estimation for complex surveys using penalised splines. Biometrika 92 831–846.
  • Breidt, F. J. and Opsomer, J. D. (2000). Local polynomial regression estimators in survey sampling. Ann. Statist. 28 1026–1053.
  • Breidt, F. J. and Opsomer, J. D. (2008). Endogenous post-stratification in surveys: Classifying with a sample-fitted model. Ann. Statist. 36 403–427.
  • Breidt, F. J., Opsomer, J. D. and Sanchez-Borrego, I. (2016). Nonparametric variance estimation under fine stratification: An alternative to collapsed strata. J. Amer. Statist. Assoc. 111 822–833.
  • Breidt, F. J., Opsomer, J. D., Johnson, A. A. and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Surv. Methodol. 33 35–44.
  • Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123–140.
  • Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
  • Cardot, H., Goga, C. and Lardin, P. (2013). Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data. Electron. J. Stat. 7 562–596.
  • Cardot, H. and Josserand, E. (2011). Horvitz–Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
  • Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63 615–620.
  • Chambers, R. L. (1996). Robust case-weighting for multipurpose establishment surveys. J. Off. Stat. 12 3–32.
  • Cochran, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York.
  • Dahlke, M., Breidt, F. J., Opsomer, J. D. and Van Keilegom, I. (2013). Nonparametric endogenous post-stratification estimation. Statist. Sinica 23 189–211.
  • Datta, G. S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation. Ann. Statist. 19 1748–1770.
  • Deville, J.-C. and Goga, C. (2004). Estimation par régression par polynômes locaux dans des enquêtes sur plusieurs échantillons. In Echantillonnage et Méthodes D’Enquêtes (P. Ardilly, ed.) 156–162. Dunod, Paris.
  • Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.
  • Elliott, M. R. and Little, R. J. A. (2000). Model-based alternatives to trimming survey weights. J. Off. Stat. 16 191–209.
  • Fay, R. E. and Herriot, R. A. (1979). Estimation of income from small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.
  • Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • Fuller, W. A. (2002). Regression estimation for survey samples (with discussion). Surv. Methodol. 28 5–23.
  • Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: An appraisal. Statist. Sci. 9 55–93.
  • Goga, C. (2004). Estimation de l’évolution d’un total en présence d’information auxiliaire: Une approche par splines de régression. C. R. Math. Acad. Sci. Paris 339 441–444.
  • Goga, C. (2005). Réduction de la variance dans les sondages en présence d’information auxiliaire: Une approche non paramétrique par splines de régression. Canad. J. Statist. 33 163–180.
  • Guggemos, F. and Tillé, Y. (2010). Penalized calibration in survey sampling: Design-based estimation assisted by mixed models. J. Statist. Plann. Inference 140 3199–3212.
  • Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population. Magy. Tud. Akad. Mat. Kut. Intéz. Közl. 5 361–374.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  • Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
  • Isaki, C. T. and Fuller, W. A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89–96.
  • Krewski, D. and Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods. Ann. Statist. 9 1010–1019.
  • Lazzeroni, L. C. and Little, R. J. A. (1998). Random-effects models for smoothing poststratification weights. J. Off. Stat. 14 61–78.
  • Li, X. and Opsomer, J. D. (2006). Model averaging in survey estimation. In Proceedings of the Section on Survey Research Methods. Amer. Statist. Assoc., Alexandria, VA.
  • McConville, K. (2011). Improved Estimation for Complex Surveys Using Modern Regression Techniques. Ph.D. thesis, Colorado State University.
  • McConville, K. S. and Breidt, F. J. (2013). Survey design asymptotics for the model-assisted penalised spline regression estimator. J. Nonparametr. Stat. 25 745–763.
  • McRoberts, R. E., Næsset, E. and Gobakken, T. (2013). Inference for lidar-assisted estimation of forest growing stock volume. Remote Sens. Environ. 128 268–275.
  • McRoberts, R. E., Tomppo, E. O. and Næsset, E. (2010). Advances and emerging issues in national forest inventories. Scand. J. For. Res. 25 368–381.
  • Montanari, G. E. and Ranalli, M. G. (2005). Nonparametric model calibration estimation in survey sampling. J. Amer. Statist. Assoc. 100 1429–1442.
  • Montanari, G. E. and Ranalli, M. G. (2009). Multiple and ridge model calibration. In Proceedings of Workshop on Calibration and Estimation in Surveys. Statistics Canada, Ottawa, ON.
  • Næsset, E., Bollandsås, O. M., Gobakken, T., Gregoire, T. G. and Ståhl, G. (2013). Model-assisted estimation of change in forest biomass over an 11 year period in a sample survey supported by airborne lidar: A case study with post-stratification to provide “activity data”. Remote Sens. Environ. 128 299–314.
  • Opsomer, J. D., Breidt, F. J., Moisen, G. G. and Kauermann, G. (2007). Model-assisted estimation of forest resources with generalized additive models. J. Amer. Statist. Assoc. 102 400–409.
  • Opsomer, J. D., Claeskens, G., Ranalli, M. G., Kauermann, G. and Breidt, F. J. (2008). Non-parametric small area estimation using penalized spline regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 265–286.
  • Park, M. and Fuller, W. A. (2005). Towards nonnegative regression weights for survey samples. Surv. Methodol. 31 85–93.
  • Park, M. and Fuller, W. A. (2009). The mixed model for survey regression estimation. J. Statist. Plann. Inference 139 1320–1331.
  • Rao, J. N. K. (2003). Small Area Estimation. Wiley-Interscience, New York.
  • Rao, J. N. K. and Singh, A. C. (1997). A ridge-shrinkage method for range-restricted weight calibration in survey sampling (Pkg: P57-85). In ASA Proceedings of the Section on Survey Research Methods 57–65. Amer. Statist. Assoc., Alexandria, VA.
  • Robinson, P. and Särndal, C. E. (1983). Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya, Ser. B 45 240–248.
  • Rueda, M., Sánchez-Borrego, I. and Arcos, A. (2009). Mean estimation in the presence of change points. Appl. Math. Lett. 22 1257–1261.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Sánchez-Borrego, I., Rueda, M. and Muñoz, J. (2012). Nonparametric methods in sample surveys. Application to the estimation of cancer prevalence. Qual. Quant. 46 405–414.
  • Särndal, C.-E. (2010). The calibration approach in survey theory and practice. Surv. Methodol. 33 99–119.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Silva, P. N. and Skinner, C. J. (1997). Variable selection for regression estimation in finite populations. Surv. Methodol. 23 23–32.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • Tipton, J., Opsomer, J. and Moisen, G. (2013). Properties of endogenous post-stratified estimation using remote sensing data. Remote Sens. Environ. 139 130–137.
  • Toth, D. and Eltinge, J. L. (2011). Building consistent regression trees from complex sample data. J. Amer. Statist. Assoc. 106 1626–1636.
  • Wang, L. (2009). Single-index model-assisted estimation in survey sampling. J. Nonparametr. Stat. 21 487–504.
  • Wang, J. C., Opsomer, J. D. and Wang, H. (2014). Bagging non-differentiable estimators in complex surveys. Surv. Methodol. 40 189–209.
  • Wang, L. and Wang, S. (2011). Nonparametric additive model-assisted estimation for survey data. J. Multivariate Anal. 102 1126–1140.
  • Wu, C. (2003). Optimal calibration estimators in survey sampling. Biometrika 90 937–951.
  • Wu, C. F. J. and Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. J. Amer. Statist. Assoc. 96 185–193.
  • Zheng, H. and Little, R. J. A. (2003). Penalized spline model-based estimation of finite population total from probability-proportional-to-size samples. J. Off. Stat. 19 99–117.
  • Zheng, H. and Little, R. J. A. (2004). Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30 209–218.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.