The Annals of Applied Statistics

Estimating large correlation matrices for international migration

Jonathan J. Azose and Adrian E. Raftery

Full-text: Open access


The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, in the data we consider there are 200 countries and only 12 data points, each one corresponding to a five-year time period. Thus a $200\times200$ correlation matrix must be estimated on the basis of 12 data points. Using Pearson correlations produces many spurious correlations. We propose a maximum a posteriori estimator for the correlation matrix with an interpretable informative prior distribution. The prior serves to regularize the correlation matrix, shrinking a priori untrustworthy elements towards zero. Our estimated correlation structure improves projections of net migration for regional aggregates, producing narrower projections of migration for Africa as a whole and wider projections for Europe. A simulation study confirms that our estimator outperforms both the Pearson correlation matrix and a simple shrinkage estimator when estimating a sparse correlation matrix.

Article information

Ann. Appl. Stat., Volume 12, Number 2 (2018), 940-970.

Received: November 2017
Revised: April 2018
First available in Project Euclid: 28 July 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Correlation estimation international migration maximum a posteriori estimation high-dimension


Azose, Jonathan J.; Raftery, Adrian E. Estimating large correlation matrices for international migration. Ann. Appl. Stat. 12 (2018), no. 2, 940--970. doi:10.1214/18-AOAS1175.

Export citation


  • Abel, G. (2013). Estimating global migration flow tables using place of birth data. Demogr. Res. 28 505–546.
  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939–967. With discussion and a rejoinder by the authors.
  • Azose, J. J. and Raftery, A. E. (2015). Bayesian probabilistic projection of international migration. Demography 52 1627–1650.
  • Azose, J. J., Ševčíková, H. and Raftery, A. E. (2016). Probabilistic population projections with migration uncertainty. Proc. Natl. Acad. Sci. USA 113 6460–6465.
  • Barbé, E. and Johansson-Nogués, E. (2008). The EU as a modest ‘force for good’: The European Neighbourhood Policy. Int. Aff. 84 81–96.
  • Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statist. Sinica 10 1281–1311.
  • Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
  • Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bien, J. and Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika 98 807–820.
  • Bijak, J. and Wiśniowski, A. (2010). Bayesian forecasting of immigration to selected European countries by using expert knowledge. J. Roy. Statist. Soc. Ser. A 173 775–796.
  • Bijak, J., Kupiszewska, D., Kupiszewski, M., Saczuk, K. and Kicinger, A. (2007). Population and labour force projections for 27 European countries, 2002–2052: Impact of international migration on population ageing. Eur. J. Popul. 23 1–31.
  • Brown, S. K. and Bean, F. D. (2012). Population growth. In Debates on U.S. Immigration (J. Gans, E. M. Replogle and D. J. Tichenor, eds.). SAGE, Thousand Oaks, CA.
  • Chaudhuri, S., Drton, M. and Richardson, T. S. (2007). Estimation of a covariance matrix with zeros. Biometrika 94 199–216.
  • Chen, X., Xu, M. and Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Statist. 41 2994–3021.
  • Chi, E. C. and Lange, K. (2014). Stable estimation of a covariance matrix guided by nuclear norm penalties. Comput. Statist. Data Anal. 80 117–128.
  • Crush, J. (1999). Fortress South Africa and the deconstruction of apartheid’s migration regime. Geoforum 30 1–11.
  • Cui, Y., Leng, C. and Sun, D. (2016). Sparse estimation of high-dimensional correlation matrices. Comput. Statist. Data Anal. 93 390–403.
  • de Beer, J., Raymer, J., van der Erf, R. and van Wissen, L. (2010). Overcoming the problems of inconsistent international migration data: A new method applied to flows in Europe. Eur. J. Popul. 26 459–481.
  • Deng, X. and Tsui, K.-W. (2013). Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Graph. Statist. 22 494–512.
  • El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • Fan, J., Han, F. and Liu, H. (2014). Challenges of big data analysis. Nat. Sci. Rev. 1 293–314.
  • Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. J. Amer. Statist. Assoc. 102 632–641.
  • Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econom. J. 19 C1–C32.
  • Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603–680. With 33 discussions by 57 authors and a reply by Fan, Liao and Mincheva.
  • Fassmann, H. and Munz, R. (1994). European East–West migration, 1945–1992. Int. Migr. Rev. 28 520–538.
  • Fosdick, B. K. and Raftery, A. E. (2014). Regional probabilistic fertility forecasting by modeling between-country correlations. Demogr. Res. 30 1011–1034.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Harris, J. R. and Todaro, M. P. (1970). Migration, unemployment and development: A two-sector analysis. Am. Econ. Rev. 60 126–142.
  • Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15 559–570.
  • Huang, A. and Wand, M. P. (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 8 439–451.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • International Organization for Migration (2015). Migration Governance Framework (C/106/40). International Organization for Migration, Geneva. Available at
  • International Organization for Migration and McKinsey & Company (2018). More than Numbers: How Migration Data Can Deliver Real-Life Benefits for Migrants and Governments. International Organization for Migration, Geneva. Available at
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
  • Ledoit, O. and Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Finance 10 603–621.
  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
  • Lee, E. S. (1966). A theory of migration. Demography 3 47–57.
  • Leonard, T. and Hsu, J. S. J. (1992). Bayesian inference for a covariance matrix. Ann. Statist. 20 1669–1696.
  • Levina, E., Rothman, A. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat. 2 245–263.
  • Liechty, J. C., Liechty, M. W. and Müller, P. (2004). Bayesian correlation estimation. Biometrika 91 1–14.
  • Liu, H., Wang, L. and Zhao, T. (2014). Sparse covariance matrix estimation with eigenvalue constraints. J. Comput. Graph. Statist. 23 439–459.
  • Mayer, T. and Zignago, S. (2011). Notes on CEPII’s distances measures: The GeoDist database.
  • Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2nd ed. Springer, New York.
  • Okolski, M. Regional dimension of international migration in Central and Eastern Europe. Genus 54 11–36.
  • Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statist. Sci. 26 369–387.
  • Raymer, J., Wiśniowski, A., Forster, J. J., Smith, P. W. F. and Bijak, J. (2013). Integrated modeling of European migration. J. Amer. Statist. Assoc. 108 801–819.
  • Rogers, A. (1990). Requiem for the net migrant. Geogr. Anal. 22 283–300.
  • Sjaastad, L. A. (1962). The costs and returns of human migration. J. Polit. Econ. 70 80–93.
  • Stark, O. and Bloom, D. E. (1985). The new economics of labor migration. Am. Econ. Rev. 75 173–178.
  • Thielemann, E. (2008). The future of the common European asylum system. Eur. Policy Anal. 1 1–8.
  • Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82–86.
  • U. S. Social Security Administration (2013). The 2013 Annual Report of the Board of Trustees of the Federal Old-age and Survivors Insurance and Federal Disability Insurance Trust Funds. Board of Trustees, Federal Old-Age and Survivors Insurance and Federal Disability Insurance Trust Funds.
  • United Nations (2012). World Population Prospects: The 2012 Revision. United Nations, New York.
  • United Nations (2016). Agreement Concerning the Relationship Between the United Nations and the International Organization for Migration (A/RES/70/976). United Nations, New York. Available at
  • United Nations (2017). World Population Prospects: The 2017 Revision. United Nations, New York.
  • Wei, G. C. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc. 85 699–704.
  • Wiśniowski, A., Smith, P. W., Bijak, J., Raymer, J. and Forster, J. J. (2015). Bayesian population forecasting: Extending the Lee–Carter method. Demography 52 1035–1059.
  • Wright, E. (2010). 2008-based national population projections for the United Kingdom and constituent countries. Popul. Trends 139 91–114.
  • Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 101 103–120.