The Annals of Applied Statistics

Semiparametric empirical best prediction for small area estimation of unemployment indicators

Maria Francesca Marino, Maria Giovanna Ranalli, Nicola Salvati, and Marco Alfò

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the labor force survey. However, direct estimates of unemployment incidence cannot be released for local labor market areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas, and the majority is characterized by a small sample size which renders direct estimates inadequate. The empirical best predictor represents an appropriate, model-based alternative. However, for non-Gaussian responses its computation and the computation of the analytic approximation to its mean squared error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a nontrivial task. In this paper, we propose a semiparametric empirical best predictor for a (possibly) nonlinear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected analytic approximation to the corresponding mean squared error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 local labor market areas using auxiliary information from administrative registers and the 2011 Census.

Article information

Source
Ann. Appl. Stat., Volume 13, Number 2 (2019), 1166-1197.

Dates
Received: December 2017
Revised: August 2018
First available in Project Euclid: 17 June 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1560758442

Digital Object Identifier
doi:10.1214/18-AOAS1226

Mathematical Reviews number (MathSciNet)
MR3963567

Zentralblatt MATH identifier
07094850

Keywords
Binary data Exponential Family finite mixtures general parameters mixed logistic model unit-level model

Citation

Marino, Maria Francesca; Ranalli, Maria Giovanna; Salvati, Nicola; Alfò, Marco. Semiparametric empirical best prediction for small area estimation of unemployment indicators. Ann. Appl. Stat. 13 (2019), no. 2, 1166--1197. doi:10.1214/18-AOAS1226. https://projecteuclid.org/euclid.aoas/1560758442


Export citation

References

  • Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Stat. Comput. 6 251–262.
  • Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55 117–128.
  • Bates, D., Mächler, M., Bolker, B. and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 1–48.
  • Battese, G., Harter, R. and Fuller, W. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 83 28–36.
  • Böhning, D. (1982). Convergence of Simar’s algorithm for finding the maximum likelihood estimate of a compound Poisson process. Ann. Statist. 10 1006–1008.
  • Boubeta, M., Lombardía, M. J. and Morales, D. (2016). Empirical best prediction under area-level Poisson mixed models. TEST 25 548–569.
  • Boubeta, M., Lombardía, M. J. and Morales, D. (2017). Poisson mixed models for studying the poverty in small areas. Comput. Statist. Data Anal. 107 32–47.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Brown, G., Chambers, R., Heady, P. and Heasman, D. (2001). Evaluation of small area estimation methods: an application to unemployment estimates from the UK LFS. In Proc. Statistics Canada Symp. Achieving Data Quality in a Statistical Agency: A Methodological Perspective, Hull: Statistics Canada.
  • Chen, J. H. (1995). Optimal rate of convergence for finite mixture models. Ann. Statist. 23 221–233.
  • Chen, S. X. and Liu, J. S. (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statist. Sinica 7 875–892.
  • D’Alò, M., Falorsi, S. and Solari, F. (2017). Space-time unit-level EBLUP for large data sets. J. Off. Stat. 33 61–77.
  • D’Alò, M., Di Consiglio, L., Falorsi, S., Ranalli, M. G. and Solari, F. (2012). Use of spatial information in small area models for unemployment rate estimation at sub-provincial areas in Italy. J. Indian Soc. Agricultural Statist. 66 43–53.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106 746–762.
  • González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Comput. Statist. Data Anal. 51 2720–2733.
  • Hobza, T. and Morales, D. (2016). Empirical best prediction under unit-level logit mixed models. J. Off. Stat. 32 661–692.
  • Hobza, T., Morales, D. and Santamaría, L. (2018). Small area estimation of poverty proportions under unit-level temporal binomial-logit mixed models. TEST 27 270–294.
  • Jiang, J. (1998). Consistent estimators in generalized linear mixed models. J. Amer. Statist. Assoc. 93 720–729.
  • Jiang, J. (2003). Empirical best prediction for small-area inference based on generalized linear mixed models. J. Statist. Plann. Inference 111 117–127.
  • Jiang, J. and Lahiri, P. (2001). Empirical best prediction for small area inference with binary data. Ann. Inst. Statist. Math. 53 217–243.
  • Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixed distribution. J. Amer. Statist. Assoc. 73 805–811.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Lindsay, B. G. (1983a). The geometry of mixture likelihoods: a general theory. Ann. Statist. 11 86–94.
  • Lindsay, B. G. (1983b). The geometry of mixture likelihoods. II. The Exponential Family. Ann. Statist. 11 783–792.
  • Lindsay, B. G. and Lesperance, M. L. (1995). A review of semiparametric mixture models. J. Statist. Plann. Inference 47 29–39.
  • López-Vizcaíno, E., Lombardía, M. J. and Morales, D. (2013). Multinomial-based small area estimation of labour force indicators. Stat. Model. 13 153–178.
  • Marino, M. F., Ranalli, M. G., Salvati, N. and Alfò, M. (2019). Supplement to “SemiParametric Empirical Best Prediction for small area estimation of unemployment indicators.” DOI:10.1214/18-AOAS1226SUPP.
  • McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162–170.
  • Molina, I., Nandram, B. and Rao, J. N. K. (2014). Small area estimation of general parameters with application to poverty indicators: a hierarchical Bayes approach. Ann. Appl. Stat. 8 852–885.
  • Molina, I., Saei, A. and Lombardía, M. J. (2007). Small area estimates of labour force participation under a multinomial logit mixed model. J. Roy. Statist. Soc. Ser. A 170 975–1000.
  • Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica 46 69–85.
  • Neuhaus, J. M. and McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 859–872.
  • Oakes, D. (1999). Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 479–482.
  • Pinheiro, J. and Bates, D. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. J. Comput. Graph. Statist. 4 12–35.
  • Prasad, N. G. N. and Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. J. Amer. Statist. Assoc. 85 163–171.
  • Rao, J. N. K. and Molina, I. (2015). Small Area Estimation, 2nd ed. Wiley Series in Survey Methodology. Wiley, Hoboken, NJ.
  • Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195–239.
  • Rodriguez, G. and Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. J. Roy. Statist. Soc. Ser. A 158 73–89.
  • Saei, A. and Chambers, R. (2003). Small area estimation under linear and generalized linear mixed models with time and area effects. In S3RI Methodology Working Papers 1–35. Southampton Statistical Sciences Research Institute, Southampton.
  • Simar, L. (1976). Maximum likelihood estimation of a compound Poisson process. Ann. Statist. 4 1200–1209.
  • Venables, W. N. and Ripley, B. D. (1994). Modern Applied Statistics with S-Plus. Statistics and Computing. Springer, New York.

Supplemental materials

  • Supplement to “Semiparametric empirical best prediction for small area estimation of unemployment indicators”. The online Supplementary Material describes the EM algorithm for parameter estimation and the procedure for estimating the covariance matrix of model parameters. Also, computational details for deriving the bias correction term for the MSE estimator of the proposed sp-EBP, as well as explicit formulas for computing model derivatives in the case of binary data are reported. Some additional simulation results are also presented. Last, a computationally efficient algorithm for estimation and inference developed in R language from the authors, together with an example data set, is made available as part of the online Supplementary Material.