• Bernoulli
  • Volume 25, Number 1 (2019), 464-498.

Consistent order estimation for nonparametric hidden Markov models

Luc Lehéricy

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We consider the problem of estimating the number of hidden states (the order) of a nonparametric hidden Markov model (HMM). We propose two different methods and prove their almost sure consistency without any prior assumption, be it on the order or on the emission distributions. This is the first time a consistency result is proved in such a general setting without using restrictive assumptions such as a priori upper bounds on the order or parametric restrictions on the emission distributions. Our main method relies on the minimization of a penalized least squares criterion. In addition to the consistency of the order estimation, we also prove that this method yields rate minimax adaptive estimators of the parameters of the HMM – up to a logarithmic factor. Our second method relies on estimating the rank of a matrix obtained from the distribution of two consecutive observations. Finally, numerical experiments are used to compare both methods and study their ability to select the right order in several situations.

Article information

Bernoulli, Volume 25, Number 1 (2019), 464-498.

Received: April 2017
Revised: September 2017
First available in Project Euclid: 12 December 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

hidden Markov model least squares method model selection nonparametric density estimation order estimation spectral method


Lehéricy, Luc. Consistent order estimation for nonparametric hidden Markov models. Bernoulli 25 (2019), no. 1, 464--498. doi:10.3150/17-BEJ993.

Export citation


  • [1] Alexandrovich, G., Holzmann, H. and Leister, A. (2016). Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103 423–434.
  • [2] Allman, E.S., Matias, C. and Rhodes, J.A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • [3] Anandkumar, A., Hsu, D.J. and Kakade, S.M. (2012). A method of moments for mixture models and hidden Markov models. In COLT 1 4.
  • [4] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Stat. Comput. 22 455–470.
  • [5] Bickel, P.J., Ritov, Y., Ryden, T. et al. (1998). Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann. Statist. 26 1614–1635.
  • [6] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
  • [7] Celeux, G. and Durand, J.-B. (2008). Selecting hidden Markov model state number with cross-validated likelihood. Comput. Statist. 23 541–564.
  • [8] Chambaz, A., Garivier, A. and Gassiat, E. (2009). A minimum description length approach to hidden Markov models with Poisson and Gaussian emissions. Application to order identification. J. Statist. Plann. Inference 139 962–977.
  • [9] Couvreur, L. and Couvreur, C. (2000). Wavelet-based non-parametric HMM’s: Theory and applications. In Acoustics, Speech, and Signal Processing, 2000. ICASSP’00. Proceedings. 2000 IEEE International Conference on 1 604–607. New York: IEEE.
  • [10] de Castro, Y., Gassiat, É. and Lacour, C. (2016). Minimax adaptive estimation of nonparametric hidden Markov models. J. Mach. Learn. Res. 17 1–43.
  • [11] de Castro, Y., Gassiat, E. and Le Corff, S. (2017). Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models. IEEE Inform. Theory.
  • [12] DeVore, R.A. and Lorentz, G.G. (1993). Constructive Approximation 303. Berlin: Springer Science & Business Media.
  • [13] Douc, R., Moulines, E., Rydén, T. et al. (2004). Asymptotic properties of the maximum likelihood estimator in autoregressive models with Markov regime. Ann. Statist. 32 2254–2304.
  • [14] Gassiat, E. (2002). Likelihood ratio inequalities with applications to various mixtures. In Annales de L’IHP Probabilités et Statistiques 38 897–906.
  • [15] Gassiat, E. and Boucheron, S. (2003). Optimal error exponents in hidden Markov models order estimation. IEEE Trans. Inform. Theory 49 964–980.
  • [16] Gassiat, E., Cleynen, A. and Robin, S. (2015). Finite state space non parametric hidden Markov models are in general identifiable. Stat. Comput. 1–11.
  • [17] Gassiat, E. and Keribin, C. (2000). The likelihood ratio test for the number of components in a mixture with Markov regime. ESAIM Probab. Stat. 4 25–52.
  • [18] Gassiat, E. and Rousseau, J. (2014). About the posterior distribution in hidden Markov models with unknown number of states. Bernoulli 20 2039–2075.
  • [19] Hansen, N. (2006). The CMA evolution strategy: A comparing review. In Towards a New Evolutionary Computation 75–102. Berlin: Springer.
  • [20] Hsu, D., Kakade, S.M. and Zhang, T. (2012). A spectral algorithm for learning hidden Markov models. J. Comput. System Sci. 78 1460–1480.
  • [21] Kleibergen, F. and Paap, R. (2006). Generalized reduced rank tests using the singular value decomposition. J. Econometrics 133 97–126.
  • [22] Lambert, M.F., Whiting, J.P. and Metcalfe, A.V. (2003). A non-parametric hidden Markov model for climate state identification. Hydrol. Earth Syst. Sci. Discuss. 7 652–667.
  • [23] Langrock, R., Kneib, T., Sohn, A. and DeRuiter, S.L. (2015). Nonparametric inference in hidden Markov models using P-splines. Biometrics 71 520–528.
  • [24] Lefèvre, F. (2003). Non-parametric probability estimation for HMM-based automatic speech recognition. Comput. Speech Lang. 17 113–136.
  • [25] Lehéricy, L. (2015). Estimation adaptative non paramétrique pour les modèles à chaîne de Markov cachée. Mémoire de M2, Orsay.
  • [26] Lehéricy, L. (2017). Supplement to “Consistent order estimation for nonparametric hidden Markov models.” DOI:10.3150/17-BEJ993SUPP.
  • [27] Le Gland, F. and Mevel, L. (2000). Exponential forgetting and geometric ergodicity in hidden Markov models. Math. Control Signals Systems 13 63–93.
  • [28] Massart, P. (2007). Concentration inequalities and model selection. In Lecture Notes in Mathematics 1896. Berlin: Springer.
  • [29] Robin, J.-M., Bonhomme, S. and Jochmans, K. (2014). Estimating multivariate latent-structure models.
  • [30] Robin, J.-M. and Smith, R.J. (2000). Tests of rank. Econometric Theory 16 151–175.
  • [31] Shang, L. and Chan, K.-P. (2009). Nonparametric discriminant HMM and application to facial expression recognition. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on 2090–2096. New York: IEEE.
  • [32] Stewart, G.W. and Sun, J.-G. (1990). Matrix Perturbation Theory (Computer Science and Scientific Computing). Boston: Academic Press.
  • [33] van Havre, Z., Rousseau, J., White, N. and Mengersen, K. (2016). Overfitting hidden Markov models with an unknown number of states. Preprint. Available at arXiv:1602.02466.
  • [34] Volant, S., Bérard, C., Martin-Magniette, M.-L. and Robin, S. (2014). Hidden Markov models with mixtures as emission distributions. Stat. Comput. 24 493–504.
  • [35] Yau, C., Papaspiliopoulos, O., Roberts, G.O. and Holmes, C. (2011). Bayesian non-parametric hidden Markov models with applications in genomics. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 37–57.

Supplemental materials

  • Supplement A: Additional proofs. We provide the algorithms we used in our simulations as well as the omitted proofs of our results.