## The Annals of Statistics

### Sparse recovery in convex hulls via entropy penalization

#### Abstract

Let (X, Y) be a random couple in S×T with unknown distribution P and (X1, Y1), …, (Xn, Yn) be i.i.d. copies of (X, Y). Denote Pn the empirical distribution of (X1, Y1), …, (Xn, Yn). Let h1, …, hN: S↦[−1, 1] be a dictionary that consists of N functions. For λ∈ℝN, denote fλ:=∑j=1Nλjhj. Let : T×ℝ↦ℝ be a given loss function and suppose it is convex with respect to the second variable. Let (f)(x, y):=(y; f(x)). Finally, let Λ⊂ℝN be the simplex of all probability distributions on {1, …, N}. Consider the following penalized empirical risk minimization problem $$\begin{eqnarray*}\hat{\lambda}^{\varepsilon}:={\mathop{\textrm{argmin}}_{\lambda\in \Lambda}}\Biggl[P_{n}(\ell \bullet f_{\lambda})+\varepsilon \sum_{j=1}^{N}\lambda_{j}\log \lambda_{j}\Biggr]\end{eqnarray*}$$ along with its distribution dependent version $$\begin{eqnarray*}\lambda^{\varepsilon}:={\mathop{\textrm{argmin}}_{\lambda\in \Lambda}}\Biggl[P(\ell \bullet f_{\lambda})+\varepsilon \sum_{j=1}^{N}\lambda_{j}\log \lambda_{j}\Biggr],\end{eqnarray*}$$ where ɛ≥0 is a regularization parameter. It is proved that the “approximate sparsity” of λɛ implies the “approximate sparsity” of λ̂ɛ and the impact of “sparsity” on bounding the excess risk of the empirical solution is explored. Similar results are also discussed in the case of entropy penalized density estimation.

#### Article information

Source
Ann. Statist., Volume 37, Number 3 (2009), 1332-1359.

Dates
First available in Project Euclid: 10 April 2009

https://projecteuclid.org/euclid.aos/1239369024

Digital Object Identifier
doi:10.1214/08-AOS621

Mathematical Reviews number (MathSciNet)
MR2509076

Zentralblatt MATH identifier
1269.62039

#### Citation

Koltchinskii, Vladimir. Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 (2009), no. 3, 1332--1359. doi:10.1214/08-AOS621. https://projecteuclid.org/euclid.aos/1239369024

#### References

• Audibert, J.-Y. (2004). Une approche PAC-bayésienne de la théorie statistique de l’apprentissage. Ph.D. thesis, Univ. Paris 6.
• Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of LASSO and Dantzig selector. Ann. Statist. To appear.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007a). Sparsity oracle inequalities for the LASSO. Electronic Journal of Statistics 1 169–194.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparse density estimation with 1 penalties. In Proc. 20th Annual Conference on Learning Theory (COLT 2007) 530–543. Lecture Notes in Artificial Intelligence 4539. Springer, Berlin.
• Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2392–2404.
• Catoni, O. (2004). Statistical learning theory and stochastic optimization. In Ecole d’Eté de Probabilités de Saint-Flour XXXI -2001. Lecture Notes in Mathematics 1851. Springer, New York.
• Dalalyan, A. and Tsybakov, A. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Proc. 20th Annual Conference on Learning Theory (COLT 2007) 97–111. Lecture Notes in Artificial Intelligence 4539. Springer, Berlin.
• Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
• Koltchinskii, V. (2005). Model selection and aggregation in sparse classification problems. Oberwolfach Reports: Meeting on Statistical and Probabilistic Methods of Model Selection, October 2005.
• Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
• Koltchinskii, V. (2008a). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. To appear.
• Koltchinskii, V. (2008b). The Dantzig selector and sparsity oracle inequalities. Preprint.
• Massart, P. (2007). Concentration inequalities and model selection. In Ecole d’ete de Probabilités de Saint-Flour 2003. Springer, Berlin.
• McAllester, D. A. (1999). Some PAC-Bayesian theorems. Machine Learning 37 355–363.
• Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subGaussian operators in asymptotic geometric analysis. Geomet. Funct. Anal. 17 1248–1282.
• Rudelson, M. and Vershynin, R. (2005). Geometric approach to error correcting codes and reconstruction of signals. Int. Math. Res. Not. 64 4019–4041.
• van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• Zhang, T. (2001). Regularized Winnow method. In Advances in Neural Information Processing Systems 13 (NIPS2000) (T. K. Leen, T. G. Dietrich and V. Tresp, eds.) 703–709 MIT Press.
• Zhang, T. (2006a). From epsilon-entropy to KL-complexity: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
• Zhang, T. (2006b). Information theoretical upper and lower bounds for statistical estimation. IEEE Trans. Inform. Theory 52 1307–1321.