Electronic Journal of Statistics

Semiparametric estimation of a two-component mixture of linear regressions in which one component is known

L. Bordes, I. Kojadinovic, and P. Vandekerkhove

Full-text: Open access

Abstract

A new estimation method for the two-component mixture model introduced in [29] is proposed. This model consists of a two-component mixture of linear regressions in which one component is entirely known while the proportion, the slope, the intercept and the error distribution of the other component are unknown. In spite of good performance for datasets of reasonable size, the method proposed in [29] suffers from a serious drawback when the sample size becomes large as it is based on the optimization of a contrast function whose pointwise computation requires $O(n^{2})$ operations. The range of applicability of the method derived in this work is substantially larger as it relies on a method-of-moments estimator free of tuning parameters whose computation requires $O(n)$ operations. From a theoretical perspective, the asymptotic normality of both the estimator of the Euclidean parameter vector and of the semiparametric estimator of the c.d.f. of the error is proved under weak conditions not involving zero-symmetry assumptions. In addition, an approximate confidence band for the c.d.f. of the error can be computed using a weighted bootstrap whose asymptotic validity is proved. The finite-sample performance of the resulting estimation procedure is studied under various scenarios through Monte Carlo experiments. The proposed method is illustrated on three real datasets of size $n=150$, 51 and 176,343, respectively. Two extensions of the considered model are discussed in the final section: a model with an additional scale parameter for the first component, and a model with more than one explanatory variable.

Article information

Source
Electron. J. Statist., Volume 7 (2013), 2603-2644.

Dates
First available in Project Euclid: 23 October 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1382547605

Digital Object Identifier
doi:10.1214/13-EJS858

Mathematical Reviews number (MathSciNet)
MR3121625

Zentralblatt MATH identifier
1294.62151

Subjects
Primary: 62J05: Linear regression
Secondary: 62G08: Nonparametric regression

Keywords
Asymptotic normality identifiability linear regression method of moments mixture multiplier central limit theorem weighted bootstrap

Citation

Bordes, L.; Kojadinovic, I.; Vandekerkhove, P. Semiparametric estimation of a two-component mixture of linear regressions in which one component is known. Electron. J. Statist. 7 (2013), 2603--2644. doi:10.1214/13-EJS858. https://projecteuclid.org/euclid.ejs/1382547605


Export citation

References

  • [1] J.A. Anderson. Multivariate logistic compounds., Biometrika, pages 17–26, 1979.
  • [2] T. Benaglia, D. Chauveau, D.R. Hunter, and D. Young. mixtools: An R package for analyzing finite mixture models., Journal of Statistical Software, 32(6):1–29, 2009. URL http://www.jstatsoft.org/v32/i06/.
  • [3] G. Boiteau, M. Singh, R.P. Singh, G.C.C. Tai, and T.R. Turner. Rate of spread of PVY-n by alate Myzus persicae (Sulzer) from infected to healthy plants under laboratory conditions., Potato Research, 41:335–344, 1998.
  • [4] L. Bordes, C. Delmas, and P. Vandekerkhove. Estimating a two-component mixture model when a component is known., Scandinavian Journal of Statistics, 33(4):733–752, 2006.
  • [5] E.A. Cohen., Inharmonic Tone Perception. PhD thesis, Stanford University, 1980.
  • [6] R.D. De Veaux. Mixtures of linear regressions., Computational Statistics and Data Analysis, 8:227–245, 1989.
  • [7] T. Duong., ks: Kernel smoothing, 2012. URL http://CRAN.R-project.org/package=ks. R package version 1.8.8.
  • [8] I.K. Glad, N.L. Hjort, and N.G. Ushakov. Correction of density estimators that are not densities., Scandinavian Journal of Statistics, 30:415–427, 2003.
  • [9] B. Grün and F. Leisch. Fitting finite mixtures of linear regression models with varying and fixed effects in R. In A. Rizzi and M. Vichi, editors, Compstat 2006, Proceedings in Computational Statistics, pages 853–860. Physica Verlag, Heidelberg, Germany, 2006.
  • [10] P. Hall and X-H. Zhou. Nonparametric estimation of component distributions in a multivariate mixture., Annals of Statistics, 31:201–224, 2003.
  • [11] D.S. Hawkins, D.M. Allen, and A.J. Stomberg. Determining the number of components in mixtures of linear models., Computational Statistics and Data Analysis, 38:15–48, 2001.
  • [12] D.R. Hunter and D.S. Young. Semiparametric mixtures of regressions., Journal of Nonparametric Statistics, pages 19–38, 2012.
  • [13] M. Hurn, A. Justel, and C.P. Robert. Estimating mixtures of regressions., Journal of Computational and Graphical Statistiscs, 12:1–25, 2003.
  • [14] P.N. Jones and G.J. McLachlan. Fitting finite mixture models in a regression context., Australian Journal of Statistics, 34:233–240, 1992.
  • [15] M.R. Kosorok., Introduction to empirical processes and semiparametric inference. Springer, New York, 2008.
  • [16] F. Leisch. Flexmix: A general framework for finite mixture models and latent class regression in R., Journal of Statistical Software, 2004. http://www.jstatsoft.org/v11/i08/.
  • [17] D.H-Y. Leung and J. Qin. Semi-parametric inference in a bivariate (multivariate) mixture model., Statistica Sinica, 16:153–163, 2006.
  • [18] M-L. Martin-Magniette, T. Mary-Huard, C. Bérard, and S. Robin. ChIPmix: Mixture model of regressions for two-color ChIP-chip analysis., Bioinformatics, 24:181–186, 2008.
  • [19] D. Nolan and D. Pollard. $U$-processes: Rates of convergence., Annals of Statistics, 15:780–799, 1987.
  • [20] R. Quandt and J. Ramsey. Estimating mixtures of normal distributions and switching regression., Journal of the American Statistical Association, 73:730–738, 1978.
  • [21] R Development Core Team., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013. URL http://www.R-project.org. ISBN 3-900051-07-0.
  • [22] N. Städler, P. Bühlmann, and S. van de Geer. $\ell_1$-penalization for mixture of regression models., Test, 19:209–256, 2010.
  • [23] R. Turner., mixreg: Functions to fit mixtures of regressions, 2011. URL http://CRAN.R-project.org/package=mixreg. R package version 0.0-4.
  • [24] T.R. Turner. Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions., Applied Statistics, 49:371–384, 2000.
  • [25] A.W. van der Vaart., Asymptotic statistics. Cambridge University Press, 1998.
  • [26] A.W. van der Vaart. Semiparametric statistics. In, École d’été de Saint-Flour 1999, pages 331–457. Springer, New-York, 2002.
  • [27] A.W. van der Vaart and J.A. Wellner., Weak convergence and empirical processes. Springer, New York, 2000. Second edition.
  • [28] A.W. van der Vaart and J.A. Wellner. Empirical processes indexed by estimated functions. In, Asymptotics: Particles, Processes and Inverse Problems, pages 234–252. Institute of Mathematical Statistics, 2007.
  • [29] P. Vandekerkhove. Estimation of a semiparametric mixture of regressions model., Journal of Nonparametric Statistics, 25(1):181–208, 2013.
  • [30] M.P. Wand and M.C. Jones. Multivariate plugin bandwidth selection., Computational Statistics, 9:97–116, 1994.
  • [31] D.S. Young and D.R. Hunter. Mixtures of regressions with predictor-dependent mixing proportions., Computational Statistics and Data Analysis, pages 2253–2266, 2010.
  • [32] H. Zhu and H. Zhang. Hypothesis testing in mixture regression models., Journal of the Royal Statistical Society Series B, 66:3–16, 2004.