Electronic Journal of Statistics

Joint estimation and variable selection for mean and dispersion in proper dispersion models

Anestis Antoniadis, Irène Gijbels, Sophie Lambert-Lacroix, and Jean-Michel Poggi

Full-text: Open access


When describing adequately complex data structures one is often confronted with the fact that mean as well as variance (or more generally dispersion) is highly influenced by some covariates. Drawbacks of the available methods is that they are often based on approximations and hence a theoretical study should deal with also studying these approximations. This however is often ignored, making the statistical inference incomplete. In the proposed framework of double generalized modelling based on proper dispersion models we avoid this drawback and as such are in a good position to use recent results on Bregman divergence for establishing theoretical results for the proposed estimators in fairly general settings. We also study variable selection when there is a large number of covariates, with this number possibly tending to infinity with the sample size. The proposed estimation and selection procedure is investigated via a simulation study, that includes also a comparative study with competitors. The use of the methods is illustrated via some real data applications.

Article information

Electron. J. Statist., Volume 10, Number 1 (2016), 1630-1676.

Received: March 2016
First available in Project Euclid: 18 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62Gxx: Nonparametric inference 62Hxx: Multivariate analysis [See also 60Exx]
Secondary: 62Jxx: Linear inference, regression

Bregman divergence Fisher-orthogonality penalization proper dispersion models variable selection SCAD


Antoniadis, Anestis; Gijbels, Irène; Lambert-Lacroix, Sophie; Poggi, Jean-Michel. Joint estimation and variable selection for mean and dispersion in proper dispersion models. Electron. J. Statist. 10 (2016), no. 1, 1630--1676. doi:10.1214/16-EJS1152. https://projecteuclid.org/euclid.ejs/1468847266

Export citation


  • [1] A. Antoniadis. (2010). Comments on “$\ell_1$-penalization for mixture regression models”, by N. Städler, P. Bühlmann, S. van de Geer, TEST, 19 (2010): 209–256. Comments: TEST, 19 (2010): 257–258.
  • [2] A. Antoniadis and J. Fan. (2001). Regularization of Wavelet Approximations., Journal of the American Statistical Association, 96 (455): 939–967.
  • [3] A. Antoniadis. (2007). Wavelet methods in Statistics: some recent developments and their applications., Statistics Surveys, 1: 16–55.
  • [4] A. Antoniadis, I. Gijbels and M. Nikolova. (2011). Penalized likelihood regression for generalized linear models with nonquadratic penalties., The Annals of the Institute of Statistical Mathematics, 63 (3): 585–615.
  • [5] A. Antoniadis, I. Gijbels and S. Lambert-Lacroix. (2014). Penalized estimation in additive varying coefficient models using grouped regularization., Statistical Papers, 55 (3): 727–750.
  • [6] A. Antoniadis, I. Gijbels and A. Verhasselt. (2012). Variable selection in additive models using P-splines., Technometrics, 54 (4): 425–438.
  • [7] R. Artes and B. Jørgensen. (2000). Longitudinal data estimating equations for dispersion models., Scandinavian Journal of Statistics, 27: 321–334.
  • [8] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. (2005). Clustering with Bregman divergences., Journal of Machine Learning, 6: 1705–1749.
  • [9] A. Belloni, V. Chernozhukov, and L. Wang. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming., Biometrika, 98 (4): 791–806.
  • [10] K. Bredies, D. A. Lorenz and St. Reiterer. (2015). Minimization of non-smooth, non-convex functionals by iterative thresholding., Journal of Optimization Theory and Applications, 165(1):78–112.
  • [11] L. Breiman. (1996). Heuristics of instability and stabilization in model selection., The Annals of Statistics, 24 (6): 2350–2383.
  • [12] R.J. Carroll (1982). Adapting for heteroscedasticity in linear models., The Annals of Statistics, 10: 1224–1233.
  • [13] C. Charalambous, J. Pan and M. Tranmer. (2014). Variable Selection in Joint Mean and Dispersion Models via Double Penalized Likelihood., Sankhya B, 76 (2): 276–304.
  • [14] C. Charalambous, J. Pan and M. Tranmer. (2015). Variable selection in joint modelling of the mean and variance for hierarchical data., Statistical Modelling, 15: 24–50.
  • [15] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. (2009)., Nonnegative Matrix and Tensor Factorization. Wiley.
  • [16] R. Cottet, R.J. Kohn and D.J. Nott. (2008). Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models., Journal of the American Statistical Association, 103 (482): 661–671.
  • [17] D.R. Cox and N. Reid. (1987). Parameter orthogonality and approximate conditional inference., Journal of the Royal Statistical Society, Series B, 49 (1): 1–39.
  • [18] D.R. Cox and N. Reid. (1989). On the stability of maximum-likelihood estimators of orthogonal parameters., The Canadian Journal of Statistics, 17 (2): 229–233.
  • [19] C. Croux, I. Gijbels and I. Prosdocimi. (2012). Robust estimation of mean and dispersion functions in extended generalized additive models., Biometrics, 68: 31–44.
  • [20] I. Daubechies, M. Defrise, and C. De Mo. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint., Communications in Pure and Applied Mathematics, 57 (11):1413–1457.
  • [21] M. Davidian and R.J. Carroll. (1987). Variance function estimation., Journal of the American Statistical Association, 82 (400): 1079–1091.
  • [22] M. Davidian and R.J. Carroll. (1988). A note on extended quasi-likelihood., Journal of the Royal Statistical Society, Series B, 50:74–82.
  • [23] Z. J. Daye, J. Chen and H. Li. (2012). High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis., Biometrics, 68:316–326.
  • [24] P.K. Dunn and G.K. Smyth. (2005). Series evaluation of Tweedie exponential dispersion meodels densities., Statistics and Computing, 15: 267–280.
  • [25] P.K. Dunn and G.K. Smyth. (2007). Evaluation of Tweedie exponential dispersion meodels densities by Fourier inversion., Statistics and Computing, 18: 73–86.
  • [26] B. Efron. (1986). Double Exponential Families and their Use in Generalized Linear Regression., Journal of the American Statistical Association, 81: 809–721.
  • [27] B. Efron, T. Hastie and R. Tibshirani. (2004). Least Angle Regression (with discussion)., The Annals of Statistics, 32: 407–451.
  • [28] L. Fahrmeir (1990). Maximum likelihood estimation in misspecified generalized linear models., Statistics, 21: 487–502.
  • [29] J. Fan and R. Li. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties., Journal of the American Statistical Association, 96 (456): 1348–1360.
  • [30] J. Fan and H. Peng. (2004). Nonconcave Penalized Likelihood with A Diverging Number of Parameters., The Annals of Statistics, 32 (3): 928–961.
  • [31] H.Y. Gao and A.G. Bruce. (1997). WaveShrink with Firm Shrinkage., Statistica Sinica, 7: 855–874.
  • [32] I. Gijbels and I. Prosdocimi. (2012). Flexible mean and dispersion function estimation in extended Generalized Additive Models., Communications in Statistics – Theory and Methods, Special Issue on Statistics for Complex Problems: Permutation Testing Methods and Related Topics, 41 (16 & 17): 3259–3277.
  • [33] I. Gijbels, I. Prosdocimi and G. Claeskens. (2010). Nonparametric estimation of mean and dispersion functions in extended generalized linear models., Test, 19 (3): 580–608.
  • [34] P.H. Gong, C.S. Zhang, Z.C. Zhao, J.Z. Huang and J.P. Ye (2013). A general iterative shrinkage and thresholding algorithm for nonconvex regularized optimization problems. In, Proceedings of 30th International Conference on Machine Learning, Atlanta, Georgia, USA. http://arxiv.org/abs/1303.4434
  • [35] P. J. Green. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives., Journal of the Royal Statistical Society, Series B, 46: 149–192.
  • [36] W. Hare and C. Sagastizábal. (2009). Computing proximal points of nonconvex functions., Mathematical Programming, Series B, 116 (1):221–258.
  • [37] J. Jia, K. Rohe, and B. Yu. (2013). The lasso under Poisson-like heteroscedasticity., Statistica Sinica, 23: 99–118.
  • [38] D. Jiang. (2012)., Concave selection in generalized linear models. Doctoral dissertation, University of Iowa. http://ir.uiowa.edu/etd/2902
  • [39] B. Johnson, D.Y. Lin, and D. Zeng. (2008). Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models., Journal of the American Statistical Association, 103: 672–680.
  • [40] B. Jørgensen. (1987). Exponential Dispersion Models., Journal of Royal Statistical Society, Series B, 49: 127–162.
  • [41] B. Jørgensen. (1992). Exponential dispersion models and extensions: a review., International Statistical Review, 60 (1): 5–20.
  • [42] B. Jørgensen. (1997). The theory of dispersion models. New York, Chapman &, Hall.
  • [43] B. Jørgensen. (2014). Dispersion Models. In, International Encyclopedia of Statistical Science, pp 392–397.
  • [44] B. Jørgensen and S.J. Knudsen. (2004). Parameter orthogonality and bias adjustement for estimating functions., Scandinavian Journal of Statistics, 31:93–114.
  • [45] M. Kolar and J. Sharpnack. (2012). Variance function estimation in high-dimensions. In, Proceedings of the 29 th International Conference on Machine Learning, Edinburgh, Scotland, UK. Editors, J. Langford and J. Pineau, 1447–1454.
  • [46] K. Lange, D.R. Hunter and I. Yang. (2000). Optimization transfer using surrogate objective functions (with discussion)., Journal of Computational and Graphical Statistics 9:1–59.
  • [47] Y. Lee and J.A. Nelder. (2000). The relationship between double-exponential familes and extended quasi-likelihood familes, with application to modelling Geissler’s human sex ratio., Applied Statistics, 49 (3): 413–419.
  • [48] Z. Li, S. Wang and X. Lin. (2012). Variable selection and estimation in generalized linear models with the seamless penalty., The Canadian Journal of Statistics, 40: 745–769.
  • [49] X. Li, T. Zhao, X. Yuan and H. Liu. (2012). An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation., R Package Vignette.
  • [50] L. Lin. (2004). Generalized quasi-likelihood., Statistical Papers, 45: 529–544.
  • [51] E. Mammen (1991). Estimating a smooth monotone regression function., The Annals of Statistics, 19 (2):724–740.
  • [52] B.D. Marx and P.H.C. Eilers. (1998). Direct generalized additive modeling with penalized likelihood., Computational Statistics & Data Analysis, 28: 193–209.
  • [53] P. McCullagh and J.A. Nelder. (1989)., Generalized Linear Models. Chapman and Hall: London.
  • [54] N. Meinshausen. (2007). Relaxed lasso., Computational Statistics and Data Analysis, 52: 374–393.
  • [55] S.M. Meyers, J.S. Ambler, M. Tan, J.C. Werner and S.S. Huang. (1992). Variation of perfluorpropane disappearance after vitrectomy., Retina, 12, 359–363.
  • [56] H.-G. Müller and U. Stadtmüller (1987). Estimation of heteroscedasticity in regression analysis., The Annals of Statistics, 15: 610–625.
  • [57] J.A. Nelder and D. Pregibon. (1987). An extended quasi likelihood function., Biometrika, 74:221–232.
  • [58] J.A. Nelder and R.W.M. Wedderburn. (1972). Generalized Linear Models., Journal of Royal Statistical Society, Series A, 135: 370–384.
  • [59] M. Park and T. Hastie. (2007). An L1 regularization-path algorithm for generalized linear models., Journal of the Royal Statistical Society, Series B, 69: 659–677.
  • [60] P. Radchenko and G.M. James. (2011). Improved variable selection with forward-Lasso adaptive shrinkage., The Annals of Applied Statistics, 5 (1): 427–448.
  • [61] R. Ramlau and G. Teschke. (2006). A projection iteration for nonlinear operator equations with sparsity con- straints., Numerische Mathematik, 104: 177–203.
  • [62] R. A. Rigby and D.M. Stasinopoulos. (2005). Generalized additive models for location, scale and shape., Applied Statistics, 54: 507–554.
  • [63] Y. She, J. Wang, H. Li and D. Wu. (2013). Group Iterative Spectrum Thresholding for Super-Resolution Sparse Spectral Selection, IEEE Transactions on signal Processing, 61(24): 6371–6386.
  • [64] G.K. Smyth. (1989). Generalized linear models with varying dispersion., Journal of the Royal Statistical Society, Series B, 51 (1): 47–60.
  • [65] G. Smyth and A.P. Verbyla. (1999a). Double generalized linear models: approximate REML and diagnostics. In, Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling (IWSM14), Graz, Austria. Editors, H. Friedl, A. Berghold, G. Kauermann, pages 66–88.
  • [66] G. Smyth and A.P. Verbyla. (1999b). Adjusted likelihood methods for modelling dispersion in generalized linear models., Environmetrics, 10, pages 695–709.
  • [67] Peter X.-K. Song. (2007). Dispersion models in regression analysis., Pakistan Journal of Statistics, 25, 529–551.
  • [68] Peter X.-K. Song and M. Tan. (2000). Marginal models for longitudinal continuous proportional data., Biometrics, 56, 496–502.
  • [69] Peter X.-K. Song, Z. Qiu and M. Tan. (2004). Modelling heterogeneous dispersion in marginal models for longitudinal proportional data., Biometrical Journal, 46 (5), 540–553.
  • [70] J. Wagener and H. Dette. (2012). The adaptive Lasso in high-dimensional sparse heteroscedastic models., Mathematical Methods of Statistics, 22 (2), 137–154.
  • [71] D. Wang and Z.Z. Zhang. (2009). Variable selection in joint generalized linear models., Journal of Applied Probability and Statistics, 25 (3): 245–256.
  • [72] R. Wedderburn. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method., Biometrika, 61: 439–447.
  • [73] L. Wu and H. Li. (2012). Variable selection for joint mean and dispersion models of the inverse Gaussian distribution., Metrika, 75 (6): 795–808.
  • [74] L.-C. Wu, Z.Z. Zhang and D.-K. Xu. (2012). Variable selection for joint mean and dispersion models of the lognormal distribution., Hacettepe Journal of Mathematics and Statistics, 41 (2): 307–320.
  • [75] T. Xia, X.-R. Wang and X.-J. Jiang (2014). Asymptotic properties of maximum quasi-likelihood estimator in quasi-likelihood nonlinear models with misspecified variance function., Statistics, 48 (4): 778–786.
  • [76] J. Yin, Z. Geng, R. Li and H. Zhang. (2010). Nonparametric covariance model., Statistica Sinica, 20: 469–479.
  • [77] Y. K. Yýlmazand A. T. Cemgil. (2012). Alpha/beta divergences and Tweedie models. Technical Report, arXiv:1209.4280.
  • [78] C.H. Zhang (2010). Nearly unbiased variable selection under minimax concave penalty., The Annals of Statistics, 38: 894–942.
  • [79] C.M. Zhang, Y. Jiang and Z. Shang. (2009). New aspects of Bregman divergence in regression and classification with parametric and nonparametric estimation., Canadian Journal of Statistics, 37: 119–139.
  • [80] C. Zhang, Y. Jiang and Y. Chai. (2010). Penalized Bregman divergence for large-dimensional regression and classification., Biometrika, 97: 551–566.
  • [81] Y. Zhang. (2013). Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models., Statistics and Computing, 23:743–757.
  • [82] W. Zhao, R. Zhang, Y. Lv and J. Liu. (2014). Variable selection for varying dispersion beta regression model., Journal of Applied Statistics, 41 (1): 95–108.