Abstract
Suppose k-variate data are drawn from a mixture of two distributions, each having independent components. It is desired to estimate the univariate marginal distributions in each of the products, as well as the mixing proportion. This is the setting of two-class, fully parametrized latent models that has been proposed for estimating the distributions of medical test results when disease status is unavailable. The problem is one of inference in a mixture of distributions without training data, and until now it has been tackled only in a fully parametric setting. We investigate the possibility of using nonparametric methods. Of course, when k=1 the problem is not identifiable from a nonparametric viewpoint. We show that the problem is "almost" identifiable when k=2; there, the set of all possible representations can be expressed, in terms of any one of those representations, as a two-parameter family. Furthermore, it is proved that when $k\geq3$ the problem is nonparametrically identifiable under particularly mild regularity conditions. In this case we introduce root-n consistent nonparametric estimators of the 2k univariate marginal distributions and the mixing proportion. Finite-sample and asymptotic properties of the estimators are described.
Citation
Peter Hall. Xiao-Hua Zhou. "Nonparametric estimation of component distributions in a multivariate mixture." Ann. Statist. 31 (1) 201 - 224, Februrary 2003. https://doi.org/10.1214/aos/1046294462
Information