## The Annals of Applied Probability

### Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing

#### Abstract

Using terminologies of information geometry, we derive upper and lower bounds of the tail probability of the sample mean for the Markov chain with finite state space. Employing these bounds, we obtain upper and lower bounds of the minimum error probability of the type-2 error under the exponential constraint for the error probability of the type-1 error in a simple hypothesis testing for a finite-length Markov chain, which yields the Hoeffding-type bound. For these derivations, we derive upper and lower bounds of cumulant generating function for Markov chain with finite state space. As a byproduct, we obtain another simple proof of central limit theorem for Markov chain with finite state space.

#### Article information

Source
Ann. Appl. Probab., Volume 27, Number 2 (2017), 811-845.

Dates
Revised: May 2016
First available in Project Euclid: 26 May 2017

https://projecteuclid.org/euclid.aoap/1495764367

Digital Object Identifier
doi:10.1214/16-AAP1216

Mathematical Reviews number (MathSciNet)
MR3655854

Zentralblatt MATH identifier
1368.62235

#### Citation

Watanabe, Shun; Hayashi, Masahito. Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing. Ann. Appl. Probab. 27 (2017), no. 2, 811--845. doi:10.1214/16-AAP1216. https://projecteuclid.org/euclid.aoap/1495764367

#### References

• [1] Adamczak, R. and Bednorz, W. (2012). Orlicz integrability of additive functionals of Harris ergodic Markov chains. Available at arXiv:1201.3567.
• [2] Adamczak, R. and Bednorz, W. (2015). Exponential concentration inequalities for additive functionals of Markov chains. ESAIM Probab. Stat. 19 440–481.
• [3] Amari, S. and Nagaoka, H. (2000). Methods of Information Geometry. Translations of Mathematical Monographs 191. Oxford Univ. Press, Oxford.
• [4] Ben-Ari, I. and Neumann, M. (2012). Probabilistic approach to Perron root, the group inverse, and applications. Linear Multilinear Algebra 60 39–63.
• [5] Bhat, B. R. (1988). On exponential and curved exponential families in stochastic processes. Math. Sci. 13 121–134.
• [6] Bhat, B. R. (2000). Stochastic Models: Analysis and Applications. New Age International, New Delhi.
• [7] Bradley, R. C. Jr. (1983). Information regularity and the central limit question. Rocky Mountain J. Math. 13 77–97.
• [8] Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 2 299–318.
• [9] Delyon, B., Juditsky, A̧. and Liptser, R. (2006). Moderate deviation principle for ergodic Markov chain. Lipschitz summands. In From Stochastic Calculus to Mathematical Finance 189–209. Springer, Berlin.
• [10] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applications of Mathematics (New York) 38. Springer, New York.
• [11] Donsker, M. D. and Varadhan, S. R. S. (1975). Asymptotic evaluation of certain Markov process expectations for large time. I. II. Comm. Pure Appl. Math. 28 1–47; ibid. 28 (1975), 279–301.
• [12] Feigin, P. D. (1981). Conditional exponential families and a representation theorem for asymptotic inference. Ann. Statist. 9 597–603.
• [13] Feller, W. (1971). An Introduction to Probability Theory and Its Applications, 2nd ed. Wiley, New York, New York.
• [14] Häggström, O. (2005). On the central limit theorem for geometrically ergodic Markov chains. Probab. Theory Related Fields 132 74–82.
• [15] Häggström, O. (2006). Acknowledgement of priority concerning “On the central limit theorem for geometrically ergodic Markov chains.” Probab. Theory Related Fields 135 470.
• [16] Häggström, O. and Rosenthal, J. S. (2007). On variance conditions for Markov chain CLTs. Electron. Commun. Probab. 12 454–464 (electronic).
• [17] Hayashi, M. (2016). Finite-block-length analysis in classical and quantum information theory. Available at arXiv:1605.02821.
• [18] Hayashi, M. (2009). Information spectrum approach to second-order coding rate in channel coding. IEEE Trans. Inform. Theory 55 4947–4966.
• [19] Hayashi, M. and Nagaoka, H. (2003). General formulas for capacity of classical-quantum channels. IEEE Trans. Inform. Theory 49 1753–1768.
• [20] Hayashi, M. and Watanabe, S. (2-4, October, (2013)). Non-asymptotic bounds on fixed length source coding for Markov chains. In Proceedings of 51st Annual Allerton Conference on Communication, Control, and Computing 875–882. Allerton House, Monticello, IL.
• [21] Hayashi, M. and Watanabe, S. (2013). Non-asymptotic and asymptotic analyses on Markov chains in several problems. Available at arXiv:1309.7528.
• [22] Hayashi, M. and Watanabe, S. (2014). Non-asymptotic and asymptotic analyses on Markov chains in several problems. In Proceedings of 2014 Information Theory and Applications Workshop, Catamaran Resort 1–10. San Diego, CA.
• [23] Hayashi, M. and Watanabe, S. (2016). Uniform random number generation from Markov chains: Non-asymptotic and asymptotic analyses. IEEE Trans. Inform. Theory 62 1795–1822.
• [24] Hayashi, M. and Watanabe, S. (2016). Information geometry approach to parameter estimation in Markov chains. Ann. Statist. 44 1495–1535.
• [25] Hervé, L., Ledoux, J. and Patilea, V. (2012). A uniform Berry–Esseen theorem on $M$-estimators for geometrically ergodic Markov chains. Bernoulli 18 703–734.
• [26] Hoeffding, W. (1965). Asymptotically optimal tests for multinomial distributions. Ann. Math. Statist. 36 369–408.
• [27] Hudson, I. L. (1982). Large sample inference for Markovian exponential families with application to branching processes with immigration. Aust. J. Stat. 24 98–112.
• [28] Jones, G. L. (2004). On the Markov chain central limit theorem. Probab. Surv. 1 299–320.
• [29] Kato, T. (1980). Perturbation Theory for Linear Operators. Springer, New York.
• [30] Kemeny, J. G. and Snell, J. L. (1960). Finite Markov Chains. Springer, New York.
• [31] Kipnis, C. and Varadhan, S. R. S. (1986). Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Comm. Math. Phys. 104 1–19.
• [32] Komorowski, T., Landim, C. and Olla, S. (2012). Fluctuations in Markov Processes: Time Symmetry and Martingale Approximation. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 345. Springer, Heidelberg.
• [33] Kontoyiannis, I., Lastras-Montaño, L. A. and Meyn, S. P. (2006). Exponential bounds and stopping rules for MCMC and general Markov chains. In First International Conference on Performance Evaluation Methodologies and Tools. ACM, New York.
• [34] Kontoyiannis, I. and Meyn, S. P. (2003). Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13 304–362.
• [35] Kuchler, U. and Sorensen, M. (1989). Exponential families of stochastic processes: A unifying semimartingale approach. Int. Stat. Rev. 57 123–144.
• [36] Lalley, S. P. (1986). Ruelle’s Perron–Frobenius theorem and the central limit theorem for additive functionals of one-dimensional Gibbs states. In Adaptive Statistical Procedures and Related Topics (Upton, N.Y., 1985). Institute of Mathematical Statistics Lecture Notes—Monograph Series 8 428–446. IMS, Hayward, CA.
• [37] Łatuszyński, K., Miasojedow, B. and Niemiro, W. (2012). Nonasymptotic bounds on the mean square error for MCMC estimates via renewal techniques. In Monte Carlo and Quasi-Monte Carlo Methods 2010. Springer Proc. Math. Stat. 23 539–555. Springer, Heidelberg.
• [38] Li, K. (2014). Second-order asymptotics for quantum hypothesis testing. Ann. Statist. 42 171–189.
• [39] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
• [40] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
• [41] Mitzenmacher, M. and Upfal, E. (2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, Cambridge.
• [42] Mosonyi, M. and Ogawa, T. (2015). Two approaches to obtain the strong converse exponent of quantum hypothesis testing for general sequences of quantum states. IEEE Trans. Inform. Theory 61 6975–6994.
• [43] Nagaoka, H. (2001). Strong converse theorems in quantum information theory. In Proc. ERATO Conference on Quantum Information Science (EQIS) 2001, 33 (2001) (M. Hayashi, ed.). IEEE, New York. [Also appeared as Chapter 3 of Asymptotic Theory of Quantum Statistical Inference, World Scientific, Singapore.]
• [44] Nagaoka, H. (2005). The exponential family of Markov chains and its information geometry. In Proceedings of the 28th Symposium on Information Theory and Its Applications (SITA2005). Okinawa, Japan.
• [45] Nakagawa, K. and Kanaya, F. (1993). On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inform. Theory 39 629–633.
• [46] Natarajan, S. (1985). Large deviations, hypotheses testing, and source coding for finite Markov chains. IEEE Trans. Inform. Theory 31 360–365.
• [47] Peskun, P. H. (1973). Optimum Monte–Carlo sampling using Markov chains. Biometrika 60 607–612.
• [48] Polyanskiy, Y., Poor, H. V. and Verdú, S. (2010). Channel coding rate in the finite blocklength regime. IEEE Trans. Inform. Theory 56 2307–2359.
• [49] Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New York.
• [50] Sørensen, M. (1986). On sequential maximum likelihood estimation for exponential families of stochastic processes. Int. Stat. Rev. 54 191–210.
• [51] Stefanov, V. T. (1995). Explicit limit results for minimal sufficient statistics and maximum likelihood estimators in some Markov processes: Exponential families approach. Ann. Statist. 23 1073–1101.
• [52] Strassen, V. (1964). Asymptotische Abschätzungen in Shannons Informationstheorie. In Trans. Third Prague Conf. Information Theory, Statist. Decision Functions, Random Processes (Liblice, 1962) 689–723. Publ. House Czech. Acad. Sci., Prague.
• [53] Tihomirov, A. N. (1980). Convergence rate in the central limit theorem for weakly dependent random variables. Teor. Veroyatn. Primen. 25 800–818.
• [54] Tomamichel, M. and Hayashi, M. (2013). A hierarchy of information quantities for finite block length analysis of quantum tasks. IEEE Trans. Inform. Theory 59 7693–7710.
• [55] Trevezas, S. and Limnios, N. (2009). Variance estimation in the central limit theorem for Markov chains. J. Statist. Plann. Inference 139 2242–2253.
• [56] Varadhan, S. R. S. (2001). Probability Theory. Courant Lecture Notes in Mathematics 7. New York Univ., Courant Institute of Mathematical Sciences, New York.
• [57] Watanabe, S. and Hayashi, M. (2014). Finite-length analysis on tail probability and simple hypothesis testing for Markov chain. In Proceeding of 2014 International Symposium on Information Theory and Its Applications 26–29 196–200. Melbourne, Australia.