Electronic Journal of Statistics

Finite mixture regression: A sparse variable selection by model selection for clustering

Emilie Devijver

Abstract

We consider a finite mixture of Gaussian regression models for high-dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an $\ell_{1}$-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators on a random model subcollection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.

Article information

Source
Electron. J. Statist., Volume 9, Number 2 (2015), 2642-2674.

Dates
Received: September 2014
First available in Project Euclid: 8 December 2015

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1449582158

Digital Object Identifier
doi:10.1214/15-EJS1082

Mathematical Reviews number (MathSciNet)
MR3432429

Zentralblatt MATH identifier
1329.62279

Citation

Devijver, Emilie. Finite mixture regression: A sparse variable selection by model selection for clustering. Electron. J. Statist. 9 (2015), no. 2, 2642--2674. doi:10.1214/15-EJS1082. https://projecteuclid.org/euclid.ejs/1449582158