The Annals of Applied Statistics

Feature selection guided by structural information

Martin Slawski, Wolfgang zu Castell, and Gerhard Tutz

Full-text: Open access


In generalized linear regression problems with an abundant number of features, lasso-type regularization which imposes an 1-constraint on the regression coefficients has become a widely established technique. Deficiencies of the lasso in certain scenarios, notably strongly correlated design, were unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301–320] introduced the elastic net. In this paper we propose to extend the elastic net by admitting general nonnegative quadratic constraints as a second form of regularization. The generalized ridge-type constraint will typically make use of the known association structure of features, for example, by using temporal- or spatial closeness.

We study properties of the resulting “structured elastic net” regression estimation procedure, including basic asymptotics and the issue of model selection consistency. In this vein, we provide an analog to the so-called “irrepresentable condition” which holds for the lasso. Moreover, we outline algorithmic solutions for the structured elastic net within the generalized linear model family. The rationale and the performance of our approach is illustrated by means of simulated and real world data, with a focus on signal regression.

Article information

Ann. Appl. Stat., Volume 4, Number 2 (2010), 1056-1080.

First available in Project Euclid: 3 August 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Generalized linear model regularization sparsity p ≫ n lasso elastic net random fields model selection signal regression


Slawski, Martin; zu Castell, Wolfgang; Tutz, Gerhard. Feature selection guided by structural information. Ann. Appl. Stat. 4 (2010), no. 2, 1056--1080. doi:10.1214/09-AOAS302.

Export citation


  • Belkin, M., Niyogi, P. and Sindwhani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7 2399–2434.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Chung, F. (1997). Spectral Graph Theory. AMS Publications.
  • Daumer, M., Thaler, K., Kruis, E., Feneberg, W., Staude, G. and Scholz, M. (2007). Steps towards a miniaturized, robust and autonomous measurement device for the long-term monitoring of patient activity: ActiBelt. Biomed. Tech. 52 149–155.
  • Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Eilers, P. and Marx, B. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statist. Sci. 11 89–121.
  • Eilers, P. and Marx, B. (1999). Generalized linear regression on sampled signals and curves: A P-spline approach. Technometrics 41 1–13.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
  • Friedman, J., Hastie, T., Hoefling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 2 302–332.
  • Genkin, A., Lewis, D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics 49 589–616.
  • Goeman, J. (2007). An efficient algorithm for 1-penalized estimation. Technical report, Dept. Medical Statistics and Bioinformatics, Univ. Leiden.
  • Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73–102.
  • Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 8 27–51.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Le Cun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W. and Jackel, L. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comput. 2 541–551.
  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, London.
  • Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • Rosenberg, S. (1997). The Laplacian on a Riemannian Manifold. Cambridge Univ. Press, Cambridge.
  • Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941–973.
  • Rue, H. and Held, L. (2001). Gaussian Markov Random Fields. Chapman & Hall/CRC, Boca Raton.
  • Slawski, M., zu Castell, W. and Tutz, G. (2009). Feature selection guided by structural Information. Technical report, Dept. Statistics, Univ. Munich. Available at
  • Slawski, M., zu Castell, W. and Tutz, G. (2010). Supplement to “Feature selection guided by structural information.” DOI: 10.1214/09-AOAS302SUPP.
  • Tibshirani, R. (1996). Regression shrinkage and variable selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 671–686.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91–108.
  • Tutz, G. and Gertheiss, J. (2010). Feature extraction in signal regression: A boosting technique for functional data regression. J. Computat. Graph. Statist. 19 154–174.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of the lasso. J. Mach. Learn. Res. 7 2541–2567.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.

Supplemental materials