• Bernoulli
  • Volume 18, Number 3 (2012), 945-974.

The log-linear group-lasso estimator and its asymptotic properties

Yuval Nardi and Alessandro Rinaldo

Full-text: Open access


We define the group-lasso estimator for the natural parameters of the exponential families of distributions representing hierarchical log-linear models under multinomial sampling scheme. Such estimator arises as the solution of a convex penalized likelihood optimization problem based on the group-lasso penalty. We illustrate how it is possible to construct an estimator of the underlying log-linear model using the blocks of nonzero coefficients recovered by the group-lasso procedure. We investigate the asymptotic properties of the group-lasso estimator as a model selection method in a double-asymptotic framework, in which both the sample size and the model complexity grow simultaneously. We provide conditions guaranteeing that the group-lasso estimator is model selection consistent, in the sense that, with overwhelming probability as the sample size increases, it correctly identifies all the sets of nonzero interactions among the variables. Provided the sequences of true underlying models is sparse enough, recovery is possible even if the number of cells grows larger than the sample size. Finally, we derive some central limit type of results for the log-linear group-lasso estimator.

Article information

Bernoulli, Volume 18, Number 3 (2012), 945-974.

First available in Project Euclid: 28 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

consistency group lasso log-linear models model selection


Nardi, Yuval; Rinaldo, Alessandro. The log-linear group-lasso estimator and its asymptotic properties. Bernoulli 18 (2012), no. 3, 945--974. doi:10.3150/11-BEJ364.

Export citation


  • [1] Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on dimension. J. Statist. Plann. Inference 113 385–402.
  • [2] Bertsekas, D.P. (1995). Nonlinear Programming. Athena: Scientific.
  • [3] Bhattacharya, R.N. and Ranga Rao, R. (1976). Normal Approximation and Asymptotic Expansions. New York: Wiley.
  • [4] Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (2007). Discrete Multivariate Analysis: Theory and Practice. New York: Springer.
  • [5] Bobkov, S.G. and Ledoux, M. (1998). On modified logarithmic Sobolev inequalities for Bernoulli and Poisson measures. J. Funct. Anal. 156 347–365.
  • [6] Brown, L.D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 9. Hayward, CA: IMS.
  • [7] Dahinden, C., Parmiggiani, G., Emerick, M.C. and Bühlmann, P. (2007). Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries. BMC Bioinformatics 8 476.
  • [8] Darroch, J.N., Lauritzen, S.L. and Speed, T.P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Statist. 8 522–539.
  • [9] Dobra, A. and Massam, H. (2010). The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7 240–253.
  • [10] Edwards, D. (2000). Introduction to Graphical Modelling, 2nd ed. New York: Springer.
  • [11] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • [12] Fienberg, S.E. and Rinaldo, A. (2007). Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. J. Statist. Plann. Inference 137 3430–3445.
  • [13] Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Available at
  • [14] Ghosal, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal. 74 49–68.
  • [15] Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint. Ann. Statist. 34 2367–2386.
  • [16] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • [17] Haberman, S.J. (1974). The Analysis of Frequency Data. Chicago: Univ. Chicago Press.
  • [18] Lauritzen, S.L. (1996). Graphical Models. Oxford Statistical Science Series 17. New York: Oxford Univ. Press.
  • [19] Lauritzen, S.L. (2002). Lectures on contingency tables. Available at
  • [20] Meier, L., van der Geer, S. and Bühlmann, P. (2006). The group lasso for logistic regression. Research Report 131, Swiss Federal Institute of Technology, Zurich.
  • [21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [22] Morris, C. (1975). Central limit theorems for multinomial sums. Ann. Statist. 3 165–188.
  • [23] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605–633.
  • [24] Negahban, S., Ravikumar, P., Wainwright, M.J. and Yu, B. (2010). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers, Available at
  • [25] Portnoy, S. (1986). On the central limit theorem in Rp when p → ∞. Probab. Theory Related Fields 73 571–583.
  • [26] Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16 356–366.
  • [27] Puig, A., Wiesel, A. and Hero, A. (2009). A multidimensional shrinkage-thresholding operator. In Proceeding of the IEEE/SP 15th Workshop on Statistical Signal Processing.
  • [28] Quine, M.P. and Robinson, J. (1984). Normal approximations to sums of scores based on occupancy numbers. Ann. Probab. 12 794–804.
  • [29] Read, T.R.C. and Cressie, N.A.C. (1988). Goodness-of-fit Statistics for Discrete Multivariate Data. New York: Springer.
  • [30] Rinaldo, A. (2006). Computing maximum likelihood estimates in log-linear models. Technical Report 835, Dept. Statistics, Carnegie Mellon Univ.
  • [31] Rinaldo, A., Fienberg, S.E. and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electron. J. Stat. 3 446–484.
  • [32] Roth, V. and Fischer, B. (2008). The group-Lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In Proceedings of the 25th International Conference on Machine Learning.
  • [33] Schervish, M.J. (1995). Theory of Statistics. New York: Springer.
  • [34] Steck, G.P. (1957). Limit theorems for conditional distributions. Univ. California Publ. Statist. 2 237–284.
  • [35] van de Geer, S.A. (2006). High-dimensional generalized linear models and the Lasso. Research Report 133, Swiss Federal Institute of Technology, Zurich.
  • [36] van de Geer, S.A. (2006). On non-asymptotic bounds for estimation in generalized linear models with highly correlated design. Research Report 134, Swiss Federal Institute of Technology, Zurich.
  • [37] Wainwright, M.J. (2009). Sharp thresholds for noisy and high-dimensional recovery of sparsity using 1-constrained quadratic programming. IEEE Trans. Inform. Theory 55 2183–2202.
  • [38] Wainwright, M., Ravikumar, P. and Lafferty, J. (2011). High-dimensional Ising model selection using 1-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • [39] Yuan, M., Joseph, V.R. and Zou, H. (2009). Structured variable selection and estimation. Ann. Appl. Stat. 3 1738–1757.
  • [40] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [41] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.