Electronic Journal of Statistics

Flexible estimation of a semiparametric two-component mixture model with one parametric component

Yanyuan Ma and Weixin Yao

Full-text: Open access


We study a two-component semiparametric mixture model where one component distribution belongs to a parametric class, while the other is symmetric but otherwise arbitrary. This semiparametric model has wide applications in many areas such as large-scale simultaneous testing/multiple testing, sequential clustering, and robust modeling. We develop a class of estimators that are surprisingly simple and are unique in terms of their construction. A unique feature of these methods is that they do not rely on the estimation of the nonparametric component of the model. Instead, the methods only require a working model of the unspecified distribution, which may or may not reflect the true distribution. In addition, we establish connections between the existing estimator and the new methods and further derive a semiparametric efficient estimator. We compare our estimators with the existing method and investigate the advantages and cost of the relatively simple estimation procedure.

Article information

Electron. J. Statist., Volume 9, Number 1 (2015), 444-474.

First available in Project Euclid: 24 March 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Efficiency large-scale simultaneous testing mixture models multiple testing robust statistics semiparametric estimator


Ma, Yanyuan; Yao, Weixin. Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electron. J. Statist. 9 (2015), no. 1, 444--474. doi:10.1214/15-EJS1008. https://projecteuclid.org/euclid.ejs/1427203125

Export citation


  • Allison, D. B., Gadbury, G. L., Heo, M., Fernandez, J. R., Lee, C. K., Prolla, T. A. and Weindruck, R. (2002). A mixture model approach for the analysis of microarray gene expression data., Computational Statistics and Data Analysis, 39, 1–20.
  • Benjamini, Y. and Hochberg, Y. (1995). Controllling the false discovery rate: A practical and powerful approach to multiple testing., Journal of Royal Statistical Society, B57, 289–300.
  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993)., Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: The Johns Hopkins University Press.
  • Böhning, D. (1999)., Computer-Assisted Analysis of Mixtures and Applications. Boca Raton, FL: Chapman and Hall/CRC.
  • Bordes, L., Mottelet S. and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model., Annals of Statistics, 34, 1204–1232.
  • Bordes, L., Delmas, C. and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model where one component is known., Scandinavian Journal of Statistics, 33, 733–752.
  • Bordes, L., Kojadinovic, I. and Vandekerkhove, P. (2013). Semiparametric estimation of a mixture of two linear regressions where one component is known., Electronic Journal of Statistics, 7, 2603–2644.
  • Bordes, L. and Vandekerkhove., P. (2010). Semiparametric two-component mixture model when a component is known: An asymptotically normal estimator., Mathematical Methods of Statistics, 19, 22–41.
  • Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. and Rubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in hdl-deficient mice., Genome Research, 10, 2022–2029.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis., Journal of the American Statistical Association, 99, 96–104.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays., Genetic Epidemiology, 23, 70–86.
  • Frühwirth-Schnatter, S. (2006)., Finite Mixture and Markov Switching Models. Springer.
  • Genovese, C. R. and Wasserman, L. (2004). A stochastic process approach to false discovery control., Annals of Statistics, 32, 1035–1061.
  • Hall, P. and Zhou, X. H. (2003). Nonparametric estimation of component distributions in a multivariate mixture., Annals of Statistics, 31, 201–224.
  • Hall, P., Neeman, A., Pakyari, R. and Elmore, R. (2005). Nonparametric inference in multivariate mixtures., Biometrika, 92, 667–678.
  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986)., Robust Statistics: The Approach Based on Influence Functions. New York: Wiley.
  • Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O. P., Wilfond, B., Borg, A. and Trent, J. (2001). Gene-expression profiles in hereditary breast cancer., New England Journal of Medicine, 344, 539–548.
  • Hohmann, D. and Holzmann, H. (2013). Semiparametric location mixtures with distinct components., Statistics, 47, 348–362.
  • Huber, P. J. and Ronchetti, E. M. (2009)., Robust Statistics. New York: Wiley.
  • Hunter, D. R., Wang, S. and Hettmanserger, T. P. (2007). Inference for mixtures of symmetric distributions., Annals of Statistics, 35, 224–251.
  • Hunter, D. R. and Young, D. S. (2012). Semiparametric mixtures of regressions., Journal of Nonparametric Statistics, 24, 19–38.
  • Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data., Journal of Royal Statistical Society, B67, 555–572.
  • Lindsay, B. G. (1995)., Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics v 5, Hayward, CA: Institute of Mathematical Statistics.
  • Liu, C. and Rubin, D. B. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME, Statisica Sinica, 5, 19–39.
  • Ma, J., Gudlaugsdottir, S. and Wood, G. (2011). Generalized EM estimation for semiparametric mixture distributions with discretized non-parametric component., Statistics and Computing, 21, 601–612.
  • McLachlan, G. J., Bean, R. W. and Ben-Tovim Jones, L. (2006). A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrys., Bioinformatics, 22, 1608–1615.
  • McLachlan, G. J. and Peel, D. (2000)., Finite Mixture Models. New York: Wiley.
  • McLachlan, G. J. and Wockner, L. (2010). Use of mixture models in multiple hypothesis testing with applications in Bioinformatics. In Hermann Locarek-Junge and Claus Weihs (Ed.), Classification as a Tool for Research: Proceedings of the 11th IFCS Biennial Conference and 33rd Annual Conference of the Gesellschaft für Klassifikation (pp. 177–184). Heidelberg, Germany: Springer-Verlag.
  • Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values., Bioinformatics, 19, 1236–1242.
  • Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments., Statistical Applications in Genetics and Molecular Biology, 3(1), article 3.
  • Smyth, G. K., Thorne, N. and Wettenhall, J. (2005)., LIMMA: Linear Models for Microarray Data, User’s Guide. Melbourne: Walter and Eliza Hall Institute of Medical Research.
  • Song, J. and Nicolae, D. (2009). A sequential clustering algorithm with applications to gene expression data., Journal of the Korean Statistical Society, 38, 175–184.
  • Song, S., Nicolae, D. L. and Song, J. (2010). Estimating the mixing proportion in a semiparametric mixture model., Computational Statistics & Data Analysis, 54, 2276–2283.
  • Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies., Proceedings of National Academy of Sciences of USA, 100, 3889–3894.
  • Vandekerkhove, P. (2012). Estimation of a semiparametric mixture of regressions model., Journal of Nonparametric Statistics, 25, 181–208.
  • White, H. (1982). Maximum likelihood estimation of misspecified models., Econometrica, 50, 1–25.
  • Xiang, S., Yao, W. and Wu, J. (2014). Minimum profile Hellinger distance estimation for a semiparametric mixture model., The Canadian Journal of Statistics, 42, 246–267.