Electronic Journal of Statistics

Robustness in sparse high-dimensional linear models: Relative efficiency and robust approximate message passing

Jelena Bradic

Full-text: Open access

Abstract

Understanding efficiency in high dimensional linear models is a longstanding problem of interest. Classical work with smaller dimensional problems dating back to Huber and Bickel has illustrated the clear benefits of efficient loss functions. When the number of parameters $p$ is of the same order as the sample size $n$, $p\approx n$, an efficiency pattern different from the one of Huber was recently established. In this work, we study relative efficiency of sparsity linear models with $p\gg n$. In the interest of deriving the asymptotic mean squared error for $l_{1}$ regularized M-estimators, we propose a novel, robust and sparse approximate message passing algorithm (RAMP), that is adaptive to the error distribution. Our algorithm includes many non-quadratic and non-differentiable loss functions. We derive its asymptotic mean squared error and show its convergence, while allowing $p,n,s\to \infty$, with $n/p\in (0,1)$ and $n/s\in (1,\infty)$. We identify new patterns of relative efficiency regarding $l_{1}$ penalized $M$ estimators. We show that the classical information bound is no longer reachable, even for light–tailed error distributions. Moreover, we show new breakdown points regarding the asymptotic mean squared error. The asymptotic mean squared error of the $l_{1}$ penalized least absolute deviation estimator (P-LAD) breaks down at a critical ratio of the number of observations per number of sparse parameters in the case of light-tailed distributions; whereas, in the case of heavy-tailed distributions, the asymptotic mean squared error breaks down at a critical ratio of the optimal tuning parameter of P-LAD to the optimal tuning parameter of the $l_{1}$ penalized least square estimator.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 3894-3944.

Dates
Received: August 2015
First available in Project Euclid: 13 December 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1481598073

Digital Object Identifier
doi:10.1214/16-EJS1212

Mathematical Reviews number (MathSciNet)
MR3581957

Zentralblatt MATH identifier
1357.62215

Subjects
Primary: 62G35: Robustness 62J07: Ridge regression; shrinkage estimators
Secondary: 60F05: Central limit and other weak theorems

Keywords
Lasso LAD efficiency robustness sparsity AMP

Citation

Bradic, Jelena. Robustness in sparse high-dimensional linear models: Relative efficiency and robust approximate message passing. Electron. J. Statist. 10 (2016), no. 2, 3894--3944. doi:10.1214/16-EJS1212. https://projecteuclid.org/euclid.ejs/1481598073


Export citation

References

  • [1] Avella Medina, M. A., and Ronchetti, E. (2014). Robust and consistent variable selection for generalized linear and additive models. (310). Retrieved from, http://archive-ouverte.unige.ch/unige:36961
  • [2] Bai, Z. D. and Yin,Y. Q. (1993) Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix, The Annals of Probability, 21, 1275–1294.
  • [3] Bayati, M. and Montanari, A. (2011), The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Trans. on Inform. Theory, 57 (2), 764–785.
  • [4] Bayati, M. and Montanari, A. (2012), The LASSO risk for Gaussian matrices, IEEE Trans. on Inform. Theory, 58 (4), 1997–2017.
  • [5] Bean, D., Bickel, P.J., Karoui, N.E. and Yu B. (2013), Optimal M-estimation in highdimensional regression, Proceedings of the National Academy of Sciences, 110 (36), 14563–14568.
  • [6] Beck, A. and Teboulle, M. (2009), A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, 2 (1), 183–202.
  • [7] Belloni, A. and Chernozhukov, V. (2011), $l_1$-penalized quantile regression in high-dimensional sparse models, The Annals of Statistics, 39 (1), 82–130.
  • [8] Bertsekas, D.P. and Tsitsiklis, J.N. (1999), Gradient Convergence in Gradient methods with Errors., SIAM J. on Optimization, 10 (3), 627–642.
  • [9] Bickel, P.J. (1975), One-step Huber estimates in the linear model., Journal of the American Statistical Association, 70 (350), 428–434.
  • [10] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009), Simultaneous analysis of Lasso and Dantzig selector, The Annals of Statistics, 37 (4), 1705–1732.
  • [11] Box, G.E.P. (1953), Non-normality and tests on variances., Biometrika, 40 (3–4), 318–335.
  • [12] Box, G.E.P. and Andersen, S.L. (1955), Permutation Theory in the Derivation of Robust Criteria and the Study of Departures from Assumption, Journal of the Royal Statistical Society. Series B (Methodological) 17 (1), 1–34.
  • [13] Boucheron, S. and Lugosi, G. and Massart, P. (2013), Concentration Inequalities: A nonasymptotic theory of independence, Oxford University Press, Oxford, 481.
  • [14] Bradic, J., Fan, J. and Wang, W. (2011), Penalized composite quasi-likelihood for ultrahigh dimensional variable selection., Journal of the Royal Statistical Society: Series B, 73 (3), 325-349.
  • [15] Bühlmann, P. and van de Geer, S. (2011), Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, 550.
  • [16] Chen Z., Tang M.-L., Wei Gao and Shi N.-Z. (2014), New Robust Variable Selection Methods for Linear Regression Models, Scandinavian Journal of Statistics, 41, 725–741.
  • [17] Donoho, D. L. and Liu, R. C. (1988), The “Automatic” Robustness of Minimum Distance Functionals., Ann. Statist., 16 (2), 552–586.
  • [18] Donoho, D., Maleki, A. and Montanari, A. (2010), The Noise-Sensitivity Phase Transition in Compressed Sensing, IEEE Trans. on Inform. Theory, 57 (10), 6920–6941.
  • [19] Donoho, D., Maleki, A. and Montanari, A. (2010) Message passing algorithms for compressed sensing: I. motivation and construction, Information Theory (ITW 2010, Cairo), 2010 IEEE Information Theory Workshop on, vol. 1, no. 5, pp. 6–8.
  • [20] Donoho, D. and Montanari, A. (2013), High Dimensional Robust M-Estimation: Asymptotic Variance via Approximate Message Passing, http://arxiv.org/pdf/1310.7320v3.pdf
  • [21] Fan, J. and Li, R. (2001), Variable selection via non concave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (456), 1348–1360.
  • [22] Fan, J., Fan, Y. and Barut, E. (2014), Adaptive Robust Variable Selection, The Annals of Statistics, 42 (1), 324–351.
  • [23] Fan, J., Li, Q. and Wang, Y. (2014), Robust Estimation of High-Dimensional Mean Regression, http://arxiv.org/pdf/1410.2150v1.pdf
  • [24] Hampel, F.R. (1968), Contributions to the Theory of Robust Estimation, Ph.D. Thesis, University of California, Berkeley.
  • [25] Huber, P.J. (1964), Robust estimation of a location parameter, Ann. Math. Statist., 35, 73–101.
  • [26] Huber, P.J. (1973), Robust regression: Asymptotics, conjectures and Monte Carlo, Annals of Statistics, 1 (5), 799–821.
  • [27] Huber, P. (1981), Robust statistics, Wiley, J. & InterScience, New, York.
  • [28] Jurečkova, J. and Sen, P.K. (1996), Robust Statistical Procedures: Asymptotics and Interrelations, Wiley Series in Probability and statistics, New, York.
  • [29] Karoui, N. (2013), Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results, http://arxiv.org/pdf/1311.2445v1.pdf
  • [30] Lambert-Lacroix, S. and Zwald, L. (2011), Robust regression through the Huber’s criterion and adaptive lasso penalty, Electron. J. Statist, 5, 1015–1053.
  • [31] Lerman, G., McCoy, M., Tropp, J.A. and Zhang, T. (2015), Robust computation of linear models via convex relaxation, Found. Comput. Math, 15 (2), 363–410.
  • [32] Loh, P.-L. (2015), Statistical consistency and asymptotic normality for high-dimensional robust M-estimators, http://arxiv.org/pdf/arXiv:1501.00312
  • [33] Iusem, A.N. and Teboulle, M. (1995), Convergence Rate Analysis of Nonquadratic Proximal Methods for Convex and Linear Programming, Mathematics of Operations Research, 20 (3), 657–677
  • [34] Mammen, E. (1989), Asymptotics with increasing dimension for robust regression with applications to the bootstrap, Annals of Statistics, 17 (1), 382–400.
  • [35] Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006), Robust Statistics: Theory and Methods, J., Wiley.
  • [36] Negahban, S., Ravikumar, P., Wainwright, M.J. and Yu B. (2012), A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers., Statistical Science, 27 (4), 538–557.
  • [37] Portnoy. S. (1985), Asymptotic behavior of M estimators of p regression parameters when $p^2/n$ is large; II. Normal approximation., Annals of Statistics, 13 (4), 1403–1417.
  • [38] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, B, 58, 267–288.
  • [39] Tukey, J.W. (1960), A survey of sampling from contaminated distributions., In: Contributions to Prob. and Statist. (Olkin, I., Ed.), Stanford Univ. Press, Stanford, 448–485.
  • [40] Rousseeuw, P.J. (1984), Least Median of Squares Regression, Journal of the American Statistical Association, 79 (388), 871–880.
  • [41] Wang, L. (2013), $L_1$ penalized LAD estimator for high dimensional linear regression, Journal of Multivariate Analysis, 120, 135–151.
  • [42] Wang, X., and YJiang, M., Mian Huang, M. and Heping Zhang, H. (2013), Robust Variable Selection With Exponential Squared Loss, Journal of the American Statistical Association, 108 (502), 632–643
  • [43] Wu, Y. and Liu, Y. (2009), Variable selection in quantile regression, Statistica Sinica, 19, 801–817.
  • [44] Yohai, V.J. (1987), High breakdown-point and high efficiency robust estimates for regression., Annals of Statistics, 15 (2), 642–656.
  • [45] Zhang, C.-H. (2010), Nearly unbiased variable selection under minimax concave penalty., Ann. Statist., 38 (2), 894–942.
  • [46] Zheng, L., Maleki, A., Wang, X., and Long, T. (2015), Does $\ell_p$-minimization outperform $\ell_1$-minimization?, http://arxiv.org/pdf/1501.03704v1.pdf
  • [47] Zou, H. and Hastie, T. (2005), Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2), 301–320.
  • [48] Zou, H. (2006), The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, 101 (476), 1418–1429.