Electronic Journal of Statistics

Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss

Abstract

We study the problem of aggregation of estimators with respect to the Kullback-Leibler divergence for various probabilistic models. Rather than considering a convex combination of the initial estimators $f_{1},\ldots,f_{N}$, our aggregation procedures rely on the convex combination of the logarithms of these functions. The first method is designed for probability density estimation as it gives an aggregate estimator that is also a proper density function, whereas the second method concerns spectral density estimation and has no such mass-conserving feature. We select the aggregation weights based on a penalized maximum likelihood criterion. We give sharp oracle inequalities that hold with high probability, with a remainder term that is decomposed into a bias and a variance part. We also show the optimality of the remainder terms by providing the corresponding lower bound results.

Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 2258-2294.

Dates
First available in Project Euclid: 23 May 2017

https://projecteuclid.org/euclid.ejs/1495504916

Digital Object Identifier
doi:10.1214/17-EJS1269

Mathematical Reviews number (MathSciNet)
MR3654825

Zentralblatt MATH identifier
1364.62082

Subjects
Primary: 62G07: Density estimation 62M15: Spectral analysis
Secondary: 62G05: Estimation

Citation

Butucea, Cristina; Delmas, Jean-François; Dutfoy, Anne; Fischer, Richard. Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss. Electron. J. Statist. 11 (2017), no. 1, 2258--2294. doi:10.1214/17-EJS1269. https://projecteuclid.org/euclid.ejs/1495504916

References

• [1] J.-Y. Audibert. Progressive mixture rules are deviation suboptimal. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 41–48. Curran Associates, Inc., 2008.
• [2] A. R. Barron and C.-H. Sheu. Approximation of density functions by sequences of exponential families., The Annals of Statistics, 19(3) :1347–1369, 1991.
• [3] P. Bellec. Optimal exponential bounds for aggregation of density estimators., arXiv :1405.3907, 2014.
• [4] J. Bigot, R. B. Lirio, J.-M. Loubes, and L. M. Alvarez. Adaptive estimation of spectral densities via wavelet thresholding and information projection., arXiv preprint arXiv :0912.2026, 2009.
• [5] S. Boyd and L. Vandenberghe., Convex optimization. Cambridge University Press, 2004.
• [6] R. C. Bradley. On positive spectral density functions., Bernoulli, 8(2):175–193, 2002.
• [7] L. M. Brègman. A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming., Ž. Vyčisl. Mat. i Mat. Fiz., 7:620–631, 1967.
• [8] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Aggregation for Gaussian regression., Ann. Statist., 35(4) :1674–1697, 08 2007.
• [9] C. Butucea, J.-F. Delmas, A. Dutfoy, and R. Fischer. Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence., arXiv :1604.06304, 2016.
• [10] O. Catoni. Universal aggregation rules with exact bias bounds. Laboratoire de Probabilités et Modeles Aléatoires, CNRS, Paris., Preprint, 510, 1999.
• [11] C. Chang and D. Politis. Aggregation of spectral density estimators., Statistics & Probability Letters, 94:204–213, 2014.
• [12] D. Dai, P. Rigollet, L. Xia, and T. Zhang. Aggregation of affine estimators., Electron. J. Statist., 8(1):302–327, 2014.
• [13] D. Dai, P. Rigollet, and T. Zhang. Deviation optimal learning using greedy $Q$-aggregation., Ann. Statist., 40(3) :1878–1905, 06 2012.
• [14] A. S. Dalalyan and J. Salmon. Sharp oracle inequalities for aggregation of affine estimators., Ann. Statist., 40(4) :2327–2355, 08 2012.
• [15] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory, volume 4539 of Lecture Notes in Comput. Sci., pages 97–111. Springer, Berlin, 2007.
• [16] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity., Machine Learning, 72(1–2):39–61, 2008.
• [17] R. B. Davies. Asymptotic inference in stationary Gaussian time-series., Advances in Appl. Probability, 5:469–497, 1973.
• [18] E. Di Nezza, G. Palatucci, and E. Valdinoci. Hitchhiker’s guide to the fractional Sobolev spaces., Bull. Sci. Math., 136(5):521–573, 2012.
• [19] U. Grenander and G. Szegö., Toeplitz forms and their applications, volume 321. Univ of California Press, 1958.
• [20] A. Juditsky and A. Nemirovski. Functional aggregation for nonparametric regression., Ann. Statist., 28(3):681–712, 05 2000.
• [21] A. Juditsky, P. Rigollet, and A. B. Tsybakov. Learning by mirror averaging., Ann. Statist., 36(5) :2183–2206, 10 2008.
• [22] G. Lecué. Lower bounds and aggregation in density estimation., J. Mach. Learn. Res., 7:971–981, 2006.
• [23] G. Lecué and S. Mendelson. Aggregation via empirical risk minimization., Probab. Theory Related Fields, 145(3–4):591–613, 2009.
• [24] C. C. Moore. The degree of randomness in a stationary time series., Ann. Math. Statist., 34 :1253–1258, 1963.
• [25] P. Rigollet. Kullback-Leibler aggregation and misspecified generalized linear models., Ann. Statist., 40(2):639–665, 04 2012.
• [26] P. Rigollet and A. B. Tsybakov. Linear and convex aggregation of density estimators., Mathematical Methods of Statistics, 16(3):260–280, 2007.
• [27] M. Rosenblatt. Remarks on a multivariate transformation., Ann. Math. Statist., 23(3):470–472, 09 1952.
• [28] A. B. Tsybakov. Optimal rates of aggregation. In B. Schölkopf and M. K. Warmuth, editors, Learning Theory and Kernel Machines, volume 2777 of Lecture Notes in Computer Science, pages 303–313. Springer Berlin Heidelberg, 2003.
• [29] Z. Wang, S. Paterlini, F. Gao, and Y. Yang. Adaptive minimax regression estimation over sparse $\ell_q$-hulls., J. Mach. Learn. Res., 15 :1675–1711, 2014.
• [30] M. Wegkamp. Model selection in nonparametric regression., Ann. Statist., 31(1):252–273, 02 2003.
• [31] Y. Yang. Combining different procedures for adaptive regression., Journal of Multivariate Analysis, 74(1):135–161, 2000.
• [32] Y. Yang. Mixing strategies for density estimation., Ann. Statist., 28(1):75–87, 02 2000.
• [33] Y. Yang. Aggregating regression procedures to improve performance., Bernoulli, 10(1):25–47, 02 2004.