Bernoulli

• Bernoulli
• Volume 25, Number 4B (2019), 3912-3938.

Rademacher complexity for Markov chains: Applications to kernel smoothing and Metropolis–Hastings

Abstract

The concept of Rademacher complexity for independent sequences of random variables is extended to Markov chains. The proposed notion of “regenerative block Rademacher complexity” (of a class of functions) follows from renewal theory and allows to control the expected values of suprema (over the class of functions) of empirical processes based on Harris Markov chains as well as the excess probability. For classes of Vapnik–Chervonenkis type, bounds on the “regenerative block Rademacher complexity” are established. These bounds depend essentially on the sample size and the probability tails of the regeneration times. The proposed approach is employed to obtain convergence rates for the kernel density estimator of the stationary measure and to derive concentration inequalities for the Metropolis–Hastings algorithm.

Article information

Source
Bernoulli, Volume 25, Number 4B (2019), 3912-3938.

Dates
Revised: December 2018
First available in Project Euclid: 25 September 2019

https://projecteuclid.org/euclid.bj/1569398789

Digital Object Identifier
doi:10.3150/19-BEJ1115

Mathematical Reviews number (MathSciNet)
MR4010977

Zentralblatt MATH identifier
07110160

Citation

Bertail, Patrice; Portier, François. Rademacher complexity for Markov chains: Applications to kernel smoothing and Metropolis–Hastings. Bernoulli 25 (2019), no. 4B, 3912--3938. doi:10.3150/19-BEJ1115. https://projecteuclid.org/euclid.bj/1569398789

References

• [1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 13 1000–1034.
• [2] Akritas, M.G. and Van Keilegom, I. (2001). Non-parametric estimation of the residual distribution. Scand. J. Stat. 28 549–567.
• [3] Athreya, K.B. and Ney, P. (1978). A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc. 245 493–501.
• [4] Azaïs, R., Delyon, B. and Portier, F. (2018). Integral estimation based on Markovian design. Adv. in Appl. Probab. 50 833–857.
• [5] Bartlett, P.L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.
• [6] Bartlett, P.L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482.
• [7] Bednorz, W., Łatuszyński, K. and Latała, R. (2008). A regeneration proof of the central limit theorem for uniformly ergodic Markov chains. Electron. Commun. Probab. 13 85–98.
• [8] Berbee, H.C.P. (1979). Random Walks with Stationary Increments and Renewal Theory. Mathematical Centre Tracts 112. Amsterdam: Mathematisch Centrum.
• [9] Bertail, P. and Clémençon, S. (2004). Edgeworth expansions of suitably normalized sample mean statistics for atomic Markov chains. Probab. Theory Related Fields 130 388–414.
• [10] Bertail, P. and Clémençon, S. (2006). Regenerative block bootstrap for Markov chains. Bernoulli 12 689–712.
• [11] Bertail, P. and Clémençon, S. (2009). Sharp bounds for the tails of functionals of Markov chains. Theory Probab. Appl. 54 609–619.
• [12] Bolthausen, E. (1980). The Berry–Esseen theorem for functionals of discrete Markov chains. Z. Wahrsch. Verw. Gebiete 54 59–73.
• [13] Bolthausen, E. (1982). On the central limit theorem for stationary mixing random fields. Ann. Probab. 10 1047–1050.
• [14] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press.
• [15] Bousquet, O., Boucheron, S. and Lugosi, G. (2004). Introduction to statistical learning theory. In Advanced Lectures on Machine Learning 169–207. Springer.
• [16] Bradley, R.C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144.
• [17] Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc. 139 xiv$+$203.
• [18] de la Peña, V.H. and Giné, E. (1999). Decoupling. Probability and Its Applications (New York). New York: Springer. From dependence to independence, Randomly stopped processes. $U$-statistics and processes. Martingales and beyond.
• [19] Dedecker, J. and Gouëzel, S. (2015). Subgaussian concentration inequalities for geometrically ergodic Markov chains. Electron. Commun. Probab. 20 64.
• [20] Douc, R., Guillin, A. and Moulines, E. (2008). Bounds on regeneration times and limit theorems for subgeometric Markov chains. Ann. Inst. Henri Poincaré Probab. Stat. 44 239–257.
• [21] Douc, R., Moulines, E. and Rosenthal, J.S. (2004). Quantitative bounds on convergence of time-inhomogeneous Markov chains. Ann. Appl. Probab. 14 1643–1665.
• [22] Einmahl, U. and Mason, D.M. (2000). An empirical process approach to the uniform consistency of kernel-type function estimators. J. Theoret. Probab. 13 1–37.
• [23] Einmahl, U. and Mason, D.M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403.
• [24] Giné, E. and Guillou, A. (2001). On consistency of kernel density estimators for randomly censored data: Rates holding uniformly over adaptive intervals. Ann. Inst. Henri Poincaré Probab. Stat. 37 503–522.
• [25] Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré Probab. Stat. 38 907–921. En l’honneur de J. Bretagnolle, D. Dacunha-Castelle, I. Ibragimov.
• [26] Giné, E. and Nickl, R. (2008). Uniform central limit theorems for kernel density estimators. Probab. Theory Related Fields 141 333–387.
• [27] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. New York: Cambridge Univ. Press.
• [28] Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242.
• [29] Hansen, B.E. (2008). Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24 726–748.
• [30] Jain, N. and Jamison, B. (1967). Contributions to Doeblin’s theory of Markov processes. Z. Wahrsch. Verw. Gebiete 8 19–40.
• [31] Jarner, S.F. and Hansen, E. (2000). Geometric ergodicity of Metropolis algorithms. Stochastic Process. Appl. 85 341–361.
• [32] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442.
• [33] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
• [34] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Heidelberg: Springer.
• [35] Łatuszyński, K., Miasojedow, B. and Niemiro, W. (2013). Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19 2033–2066.
• [36] Levental, S. (1988). Uniform limit theorems for Harris recurrent Markov chains. Probab. Theory Related Fields 80 101–118.
• [37] Malinovskiĭ, V.K. (1986). Limit theorems for Harris Markov chains. I. Theory Probab. Appl. 31 315–332.
• [38] Malinovskiĭ, V.K. (1989). Limit theorems for Harris Markov chains. II. Theory Probab. Appl. 34 289–303.
• [39] McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics, 1989 (Norwich, 1989). London Mathematical Society Lecture Note Series 141 148–188. Cambridge: Cambridge Univ. Press.
• [40] Mengersen, K.L. and Tweedie, R.L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. Ann. Statist. 24 101–121.
• [41] Meyn, S. and Tweedie, R.L. (2009). Markov Chains and Stochastic Stability, 2nd ed. Cambridge: Cambridge Univ. Press. With a prologue by Peter W. Glynn.
• [42] Mohri, M. and Rostamizadeh, A. (2010). Rademacher complexity for non i.i.d. processes. Adv. Neural Inf. Process. Syst. 1097–1104.
• [43] Mohri, M. and Rostamizadeh, A. (2010). Stability bounds for stationary $\phi$-mixing and $\beta$-mixing processes. J. Mach. Learn. Res. 11 789–814.
• [44] Nickl, R. and Söhl, J. (2017). Nonparametric Bayesian posterior contraction rates for discretely observed scalar diffusions. Ann. Statist. 45 1664–1693.
• [45] Nolan, D. and Pollard, D. (1987). $U$-processes: Rates of convergence. Ann. Statist. 15 780–799.
• [46] Nummelin, E. (1978). A splitting technique for Harris recurrent Markov chains. Z. Wahrsch. Verw. Gebiete 43 309–318.
• [47] Nummelin, E. (1984). General Irreducible Markov Chains and Nonnegative Operators. Cambridge Tracts in Mathematics 83. Cambridge: Cambridge Univ. Press.
• [48] Paulin, D. (2015). Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Probab. 20 79.
• [49] Peligrad, M. (1992). Properties of uniform consistency of the kernel estimators of density and of regression functions under dependence assumptions. Stoch. Stoch. Rep. 40 147–168.
• [50] Portier, F. (2016). On the asymptotics of $Z$-estimators indexed by the objective functions. Electron. J. Stat. 10 464–494.
• [51] Portier, F. and Segers, J. (2018). On the weak convergence of the empirical conditional copula under a simplifying assumption. J. Multivariate Anal. 166 160–181.
• [52] Rakhlin, A. and Sridharan, K. (2015). Festschrift for Alexey Chervonenkis. In Measures of Complexity 171–185. Cham: Springer.
• [53] Robert, C.P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer Texts in Statistics. New York: Springer.
• [54] Roberts, G.O. and Rosenthal, J.S. (2004). General state space Markov chains and MCMC algorithms. Probab. Surv. 1 20–71.
• [55] Roberts, G.O. and Tweedie, R.L. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83 95–110.
• [56] Shorack, G.R. and Wellner, J.A. (2009). Empirical Processes with Applications to Statistics. Classics in Applied Mathematics 59. Philadelphia, PA: SIAM.
• [57] Smith, W.L. (1955). Regenerative stochastic processes. Proc. R. Soc. Lond. Ser. A 232 6–31.
• [58] Stute, W. (1982). A law of the logarithm for kernel density estimators. Ann. Probab. 10 414–422.
• [59] Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76.
• [60] van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge: Cambridge Univ. Press.
• [61] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics. Springer Series in Statistics. New York: Springer.
• [62] van der Vaart, A.W. and Wellner, J.A. (2007). Empirical processes indexed by estimated functions. In Asymptotics: Particles, Processes and Inverse Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 55 234–252. Beachwood, OH: IMS.
• [63] Vapnik, V.N. (1998). Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. New York: Wiley.
• [64] Vapnik, V.N. and Červonenkis, A.J. (1971). The uniform convergence of frequencies of the appearance of events to their probabilities. Theory Probab. Appl. 16 264–279.
• [65] Walter, G. and Blum, J. (1979). Probability density estimation using delta sequences. Ann. Statist. 7 328–340.
• [66] Wintenberger, O. (2017). Exponential inequalities for unbounded functions of geometrically ergodic Markov chains: Applications to quantitative error bounds for regenerative Metropolis algorithms. Statistics 51 222–234.