Electronic Journal of Statistics

Hierarchical Bayes, maximum a posteriori estimators, and minimax concave penalized likelihood estimation

Robert L. Strawderman, Martin T. Wells, and Elizabeth D. Schifano

Full-text: Open access


Priors constructed from scale mixtures of normal distributions have long played an important role in decision theory and shrinkage estimation. This paper demonstrates equivalence between the maximum aposteriori estimator constructed under one such prior and Zhang’s minimax concave penalization estimator. This equivalence and related multivariate generalizations stem directly from an intriguing representation of the minimax concave penalty function as the Moreau envelope of a simple convex function. Maximum aposteriori estimation under the corresponding marginal prior distribution, a generalization of the quasi-Cauchy distribution proposed by Johnstone and Silverman, leads to thresholding estimators having excellent frequentist risk properties.

Article information

Electron. J. Statist., Volume 7 (2013), 973-990.

First available in Project Euclid: 15 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C60 62J07: Ridge regression; shrinkage estimators

Convex optimization Lasso penalty Moreau regularization minimax concave penalty sparsity smoothly clipped absolute deviation penalty thresholding


Strawderman, Robert L.; Wells, Martin T.; Schifano, Elizabeth D. Hierarchical Bayes, maximum a posteriori estimators, and minimax concave penalized likelihood estimation. Electron. J. Statist. 7 (2013), 973--990. doi:10.1214/13-EJS795. https://projecteuclid.org/euclid.ejs/1366031047

Export citation


  • [1] Abramowitz, M. and Stegun, I. (1970)., Handbook of mathematical functions. Dover Publications Inc., New York.
  • [2] Antoniadis, A. and Fan, J. (2001). Regularization of Wavelet Approximations., J. Am. Statist. Assoc. 96 939-955.
  • [3] Armagan, A., Dunson, D. and Lee, J. (2011). Generalized double Pareto shrinkage., ArXiv e-prints.
  • [4] Baricz, A. (2008). Mills’ ratio: Monotonicity patterns and functional inequalities., J. Math. Anal. Applic. 340 1362-1370.
  • [5] Berger, J. O. and Robert, C. (1990). Subjective hierarchical Bayes estimation of a multivariate normal mean: on the frequentist interface., Ann. Statist. 18 617–651.
  • [6] Berger, J. O. and Strawderman, W. E. (1996). Choice of hierarchical priors: admissibility in estimation of normal means., Ann. Statist. 24 931–951.
  • [7] Berger, J. O., Strawderman, W. E. and Tang, D. (2005). Posterior Propriety and Admissibility of Hyperpriors in Normal Hierarchical Models., Ann. Statist. 33 606–646.
  • [8] Box, G. E. P. and Tiao, G. C. (1992)., Bayesian Inference in Statistical Analysis (1973 ed., Wiley Classics Library). John Wiley and Sons, New York.
  • [9] Breheny, P. and Huang, J. (2009). Penalized methods for bi-level variable selection., Stat. Interface 2 369–380.
  • [10] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection., Ann. Appl. Stat. 5 232–253.
  • [11] Bruce, A. G. and Gao, H. Y. (1996)., Applied Wavelet Analysis with S-Plus. Springer, New York.
  • [12] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465-480.
  • [13] Chen, M.-H., Ibrahim, J. G. and Shao, Q.-M. (2006). Posterior Propriety and Computation for the Cox Regression Model with Applications to Missing Covariates., Biometrika 93 pp. 791-807.
  • [14] Chen, M.-H. and Shao, Q.-M. (2001). Propriety of Posterior Distribution for Dichotomous Quantal Response Models., Proceedings of the American Mathematical Society 129 pp. 293-302.
  • [15] Cox, D. R. (1972). Regression Models and Life-Tables., Journal of the Royal Statistical Society. Series B (Methodological) 34 pp. 187-220.
  • [16] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties., J. Am. Statist. Assoc. 96 1348–1360.
  • [17] Fourdrinier, D., Strawderman, W. E. and Wells, M. T. (1998). On the construction of Bayes minimax estimators., Ann. Statist. 26 660–671.
  • [18] Gao, H. and Bruce, A. G. (1997). Waveshrink with firm shrinkage., Statist. Sinica 7 855–874.
  • [19] Gomez-Sanchez-Manzano, E., Gomez-Villegas, M. A. and Marin, J. M. (2008). Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications., Comm. Stat. Thry. Meth. 37 972-985.
  • [20] Griffin, J. E. and Brown, P. J. (2007). Bayesian adaptive Lassos with non-convex penalization. Technical Report, Dept. of Statistics, University of, Warwick.
  • [21] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems., Bayesian Analysis 6 171–188.
  • [22] Hans, C. (2009). Bayesian Lasso regression., Biometrika 96 835–845.
  • [23] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences., Ann. Statist. 32 1594–1649.
  • [24] Kass, R. E. and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules., Journal of the American Statistical Association 91 pp. 1343-1370.
  • [25] Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet: Coordinate Descent With Nonconvex Penalties., Journal of the American Statistical Association 106 1125-1138.
  • [26] Park, T. and Casella, G. (2008). The Bayesian Lasso., J. Am. Statist. Assoc. 103 681–686.
  • [27] Polson, N. G. and Scott, J. G. (2011). Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction (with discussion). In, Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman and M. West, eds.) 501–525. Oxford University Press.
  • [28] Robert, C. P. (2007)., The Bayesian Choice. Springer-Verlag, New York.
  • [29] Rockafellar, R. T. and Wets, R. J. B. (2004)., Variational Analysis. Springer-Verlag, Berlin.
  • [30] Sampford, M. R. (1953). Some Inequalities on Mill’s Ratio and Related Functions., Ann. Math. Statist. 24 130–132.
  • [31] Schifano, E. D. (2010). Topics in Penalized Estimation PhD thesis, Cornell, University.
  • [32] Schifano, E. D., Strawderman, R. L. and Wells, M. T. (2010). Majorization-minimization algorithms for nonsmoothly penalized objective functions., Electron. J. Stat. 4 1258–1299.
  • [33] Strawderman, W. E. (1971). Proper Bayes minimax estimators of the normal multivariate normal distribution., Ann. Math. Statist. 42 385-388.
  • [34] Strawderman, R. L. and Wells, M. T. (2012). On Hierarchical Prior Specifications and Penalized Likelihood. In, Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festricht for William E. Strawderman, (D. Fourdrinier, E. Marchand and A. Ruhkin, eds.) 8 154-180. Institute of Mathematical Statistics, Hayward, CA.
  • [35] Takada, Y. (1979). Stein’s positive part estimator and Bayes estimator., Ann. Inst. Statist. Math. 31 177-183.
  • [36] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., J. R. Statist. Soc. B 58 267–288.
  • [37] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine., J. Mach. Learn. Res. 1 211–244.
  • [38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Statist. Soc. B 68 49–67.
  • [39] Zhang, C.-H. (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty., Ann. Statist. 38 894–942.
  • [40] Zlobec, S. (2003). Estimating convexifiers in continuous optimization., Math. Comm. 8 129-137.
  • [41] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models., Ann. Statist. 36 1509–1533.