## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 6 (2017), 2708-2735.

### Estimating a probability mass function with unknown labels

Dragi Anevski, Richard D. Gill, and Stefan Zohren

#### Abstract

In the context of a species sampling problem, we discuss a nonparametric maximum likelihood estimator for the underlying probability mass function. The estimator is known in the computer science literature as the high profile estimator. We prove strong consistency and derive the rates of convergence, for an extended model version of the estimator. We also study a sieved estimator for which similar consistency results are derived. Numerical computation of the sieved estimator is of great interest for practical problems, such as forensic DNA analysis, and we present a computational algorithm based on the stochastic approximation of the expectation maximisation algorithm. As an interesting byproduct of the numerical analyses, we introduce an algorithm for bounded isotonic regression for which we also prove convergence.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 6 (2017), 2708-2735.

**Dates**

Received: May 2016

First available in Project Euclid: 15 December 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1513328588

**Digital Object Identifier**

doi:10.1214/17-AOS1542

**Mathematical Reviews number (MathSciNet)**

MR3737907

**Zentralblatt MATH identifier**

06838148

**Subjects**

Primary: 62G05: Estimation 62G20: Asymptotic properties 65C60: Computational problems in statistics 62P10: Applications to biology and medical sciences

**Keywords**

NPMLE high profile probability mass function strong consistency sieve ordered monotone rearrangement nonparametric SA-EM rates

#### Citation

Anevski, Dragi; Gill, Richard D.; Zohren, Stefan. Estimating a probability mass function with unknown labels. Ann. Statist. 45 (2017), no. 6, 2708--2735. doi:10.1214/17-AOS1542. https://projecteuclid.org/euclid.aos/1513328588

#### Supplemental materials

- Supplement to “Estimating a probability mass function with unknown labels”. Supplement consisted of Supplement A: Existence of the PML; Supplement B: Computation of the PML, and Supplement C: An algorithm for estimating a decreasing multinomial probability with lower bound.Digital Object Identifier: doi:10.1214/17-AOS1542SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.