Bernoulli

  • Bernoulli
  • Volume 25, Number 4B (2019), 3421-3458.

Concentration of weakly dependent Banach-valued sums and applications to statistical learning methods

Gilles Blanchard and Oleksandr Zadorozhnyi

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We obtain a Bernstein-type inequality for sums of Banach-valued random variables satisfying a weak dependence assumption of general type and under certain smoothness assumptions of the underlying Banach norm. We use this inequality in order to investigate in the asymptotical regime the error upper bounds for the broad family of spectral regularization methods for reproducing kernel decision rules, when trained on a sample coming from a $\tau$-mixing process.

Article information

Source
Bernoulli, Volume 25, Number 4B (2019), 3421-3458.

Dates
Received: January 2018
Revised: October 2018
First available in Project Euclid: 25 September 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1569398772

Digital Object Identifier
doi:10.3150/18-BEJ1095

Mathematical Reviews number (MathSciNet)
MR4010960

Zentralblatt MATH identifier
07110143

Keywords
Banach-valued process Bernstein inequality concentration spectral regularization weak dependence

Citation

Blanchard, Gilles; Zadorozhnyi, Oleksandr. Concentration of weakly dependent Banach-valued sums and applications to statistical learning methods. Bernoulli 25 (2019), no. 4B, 3421--3458. doi:10.3150/18-BEJ1095. https://projecteuclid.org/euclid.bj/1569398772


Export citation

References

  • [1] Andrews, D.W.K. (1984). Nonstrong mixing autoregressive processes. J. Appl. Probab. 21 930–934.
  • [2] Andrews, D.W.K. (1988). Laws of large numbers for dependent nonidentically distributed random variables. Econometric Theory 4 458–467.
  • [3] Argyriou, A. and Dinuzzo, F. (2014). A unifying view of representer theorems. In International Conference on Machine Learning 31 (ICML 2014) (E.P. Xing and T. Jebara, eds.). Proceedings of Machine Learning Research 32 748–756.
  • [4] Bauer, F., Pereverzev, S. and Rosasco, L. (2007). On regularization algorithms in learning theory. J. Complexity 23 52–72.
  • [5] Benett, K. and Bredensteiner, J. (2000). Duality and geometry in support vector machine classifiers. In International Conference on Machine Learning 17 (ICML 2000) (P. Langley, ed.) 57–64.
  • [6] Bernstein, S. (1924). On a modification of Chebyschev’s inequality and of the error formula of Laplace. Ann. Sci. Inst. Sav. Ukraine, Sect. Math 4.
  • [7] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. New York: Springer.
  • [8] Bickel, P.J. and Bühlmann, P. (1999). A new mixing notion and functional central limit theorems for a sieve bootstrap in time series. Bernoulli 5 413–446.
  • [9] Blanchard, G., Lee, G. and Scott, C. (2011). Generalizing from several related classification tasks to a new unlabeled sample. In Advances in Neural Inf. Proc. Systems 24 (NIPS 2011) (J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira and K.Q. Weinberger, eds.) 2438–2446.
  • [10] Blanchard, G. and Mücke, N. (2018). Optimal rates for regularization of statistical inverse learning problems. Found. Comput. Math. 18 971–1013.
  • [11] Bosq, D. (1993). Bernstein-type large deviations inequalities for partial sums of strong mixing processes. Statistics 24 59–70.
  • [12] Bosq, D. (2000). Linear Processes in Function Spaces: Theory and Applications. Lecture Notes in Statistics 149. New York: Springer.
  • [13] Bradley, R.C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144.
  • [14] Canu, S., Mary, X. and Rakotomamonjy, A. (2003). Functional learning through kernel. 5 89–110. IOS Press.
  • [15] Caponnetto, A. and De Vito, E. (2007). Optimal rates for the regularized least-squares algorithm. Found. Comput. Math. 7 331–368.
  • [16] Combettes, P.L., Salzo, S. and Villa, S. (2018). Regularized learning schemes in feature Banach spaces. Anal. Appl. (Singap.) 16 1–54.
  • [17] De Vito, E., Rosasco, L. and Caponnetto, A. (2006). Discretization error analysis for Tikhonov regularization. Anal. Appl. (Singap.) 4 81–99.
  • [18] Dedecker, J., Doukhan, P., Lang, G., León R., J.R., Louhichi, S. and Prieur, C. (2007). Weak Dependence: With Examples and Applications. Lecture Notes in Statistics 190. New York: Springer.
  • [19] Dedecker, J. and Merlevède, F. (2015). Moment bounds for dependent sequences in smooth Banach spaces. Stochastic Process. Appl. 125 3401–3429.
  • [20] Doukhan, P. and Louhichi, S. (1999). A new weak dependence condition and applications to moment inequalities. Stochastic Process. Appl. 84 313–342.
  • [21] Engl, H.W., Hanke, M. and Neubauer, A. (1996). Regularization of Inverse Problems. Mathematics and Its Applications 375. Dordrecht: Kluwer Academic.
  • [22] Esary, J.D., Proschan, F. and Walkup, D.W. (1967). Association of random variables, with applications. Ann. Math. Stat. 38 1466–1474.
  • [23] Fan, X., Grama, I. and Liu, Q. (2015). Exponential inequalities for martingales with applications. Electron. J. Probab. 20 1–22.
  • [24] Fortuin, C.M., Kasteleyn, P.W. and Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22 89–103.
  • [25] Freedman, D.A. (1975). On tail probabilities for martingales. Ann. Probab. 3 100–118.
  • [26] Hang, H. and Steinwart, I. (2017). A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning. Ann. Statist. 45 708–743.
  • [27] Hein, M., Bousquet, O. and Schölkopf, B. (2005). Maximal margin classification for metric spaces. J. Comput. System Sci. 71 333–359.
  • [28] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • [29] Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer Series in Statistics. New York: Springer.
  • [30] Ibragimov, I.A. (1959). Some limit theorems for stochastic processes stationary in the strict sense. Dokl. Akad. Nauk SSSR 125 711–714.
  • [31] Jirak, M. (2018). Rate of convergence for Hilbert space valued processes. Bernoulli 24 202–230.
  • [32] Kolmogorov, A.N. and Rozanov, J.A. (1960). On a strong mixing condition for stationary Gaussian processes. Theory Probab. Appl. 5 204–208.
  • [33] Kontorovich, L. (2006). Metric and mixing sufficient conditions for concentration of measure. Available at arxiv.org/abs/math/0610427.
  • [34] Kontorovich, L. and Ramanan, K. (2008). Concentration inequalities for dependent random variables via the martingale method. Ann. Probab. 36 2126–2158.
  • [35] Marton, K. (2004). Measure concentration for Euclidean distance in the case of dependent random variables. Ann. Probab. 32 2526–2544.
  • [36] Maume-Deschamps, V. (2006). Exponential inequalities and functional estimations for weak dependent data; applications to dynamical systems. Stoch. Dyn. 6 535–560.
  • [37] Mc Leish, D. (1975). Invariance principles and mixing random variables. Econometric Theory 4 165–178.
  • [38] Merlevède, F., Peligrad, M. and Rio, E. (2009). Bernstein inequality and moderate deviations under strong mixing conditions. In High Dimensional Probability V: The Luminy Volume. Inst. Math. Stat. (IMS) Collect. 5 273–292. Beachwood, OH: IMS.
  • [39] Micchelli, C.A. and Pontil, M. (2004). A function representation for learning in Banach spaces. In Learning Theory. Lecture Notes in Computer Science 3120 255–269. Berlin: Springer.
  • [40] Pinelis, I. (1992). An approach to inequalities for the distributions of infinite-dimensional martingales. In Probability in Banach Spaces, 8 (Brunswick, ME, 1991). Progress in Probability 30 128–134. Boston, MA: Birkhäuser.
  • [41] Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22 1679–1706.
  • [42] Pinelis, I.F. and Sakhanenko, A.I. (1986). Remarks on inequalities for probabilities of large deviations. Theory Probab. Appl. 30 143–148.
  • [43] Potapov, D. and Sukochev, F. (2014). Fréchet differentiability of $\mathcal{S}^{p}$ norms. Adv. Math. 262 436–475.
  • [44] Rio, E. (1996). Sur le théorème de Berry–Esseen pour les suites faiblement dépendantes. Probab. Theory Related Fields 104 255–282.
  • [45] Rio, E. (2013). Extensions of the Hoeffding–Azuma inequalities. Electron. Commun. Probab. 18 no. 54, 6.
  • [46] Rosasco, L., Belkin, M. and De Vito, E. (2010). On learning with integral operators. J. Mach. Learn. Res. 2 905–934.
  • [47] Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 42 43–47.
  • [48] Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes. Ann. Probab. 28 416–461.
  • [49] Song, G. and Zhang, H. (2011). Reproducing kernel Banach spaces with the $\ell^{1}$ norm II: Error analysis for regularized least square regression. Neural Comput. 23 2713–2729.
  • [50] Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2011). Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint. In Advances in Neural Information Processing Systems 24 (NIPS 2011) (J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira and K.Q. Weinberger, eds.) 1773–1781.
  • [51] Steinwart, I. (2009). Two oracle inequalities for regularized boosting classifiers. Stat. Interface 2 271–284.
  • [52] van de Geer, S.A. (2002). On Hoeffding’s inequality for dependent random variables. In Empirical Process Techniques for Dependent Data 161–169. Boston, MA: Birkhäuser.
  • [53] Wintenberger, O. (2010). Deviation inequalities for sums of weakly dependent time series. Electron. Commun. Probab. 15 489–503.
  • [54] Yurinskyi, V. (1970). The infinite-dimensional version of S.N. Bernšteĭn’s inequalities. Theory Probab. Appl. 15 108–109.
  • [55] Yurinsky, V. (1995). Sums and Gaussian Vectors. Lecture Notes in Math. 1617. Berlin: Springer.
  • [56] Zhang, H., Xu, Y. and Zhang, J. (2009). Reproducing kernel Banach spaces for machine learning. J. Mach. Learn. Res. 10 2741–2775.
  • [57] Zhang, H. and Zhang, J. (2013). Vector-valued reproducing kernel Banach spaces with applications to multi-task learning. J. Complexity 29 195–215.
  • [58] Zhang, T. (2002). On the dual formulation of regularized learning schemes with convex risks. Mach. Learn. 46 91–129.