The Annals of Statistics

Multiscale likelihood analysis and complexity penalized estimation

Eric D. Kolaczyk and Robert D. Nowak

Full-text: Open access


We describe here a framework for a certain class of multiscale likelihood factorizations wherein, in analogy to a wavelet decomposition of an L2 function, a given likelihood function has an alternative representation as a product of conditional densities reflecting information in both the data and the parameter vector localized in position and scale. The framework is developed as a set of sufficient conditions for the existence of such factorizations, formulated in analogy to those underlying a standard multiresolution analysis for wavelets, and hence can be viewed as a multiresolution analysis for likelihoods. We then consider the use of these factorizations in the task of nonparametric, complexity penalized likelihood estimation. We study the risk properties of certain thresholding and partitioning estimators, and demonstrate their adaptivity and near-optimality, in a minimax sense over a broad range of function spaces, based on squared Hellinger distance as a loss function. In particular, our results provide an illustration of how properties of classical wavelet-based estimators can be obtained in a single, unified framework that includes models for continuous, count and categorical data types.

Article information

Ann. Statist., Volume 32, Number 2 (2004), 500-527.

First available in Project Euclid: 28 April 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C20: Minimax procedures 62G05: Estimation
Secondary: 60E05: Distributions: general theory

Factorization Haar bases Hellinger distance Kullback–Leibler divergence minimax model selection multiresolution recursive partitioning thresholding estimators wavelets


Kolaczyk, Eric D.; Nowak, Robert D. Multiscale likelihood analysis and complexity penalized estimation. Ann. Statist. 32 (2004), no. 2, 500--527. doi:10.1214/009053604000000076.

Export citation


  • Bar-Lev, S. K. and Enis, P. (1986). Reproducibility and natural exponential families with power variance functions. Ann. Statist. 14 1507--1522.
  • Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, New York.
  • Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
  • Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034--1054.
  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia.
  • DeVore, R. A. (1998). Nonlinear approximation. In Acta Numerica 7 51--150. Cambridge Univ. Press.
  • Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl. Comput. Harmon. Anal. 1 100--115.
  • Donoho, D. L. (1997). CART and best-ortho-basis: A connection. Ann. Statist. 25 1870--1911.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301--369.
  • Donoho, D. L., Liu, R. and MacGibbon, B. (1990). Minimax risk over hyperrectangles, and implications. Ann. Statist. 18 1416--1437.
  • Girardi, M. and Sweldens, W. (1997). A new class of unbalanced Haar wavelets that form an unconditional basis for $L_p$ on general measure spaces. J. Fourier Anal. Appl. 3 457--474.
  • Joshi, S. W. and Patil, G. P. (1970). A class of statistical models for multiple counts. In Random Counts in Scientific Work (G. P. Patil, ed.) 2 189--203. Pennsylvania State Univ. Press.
  • Kolaczyk, E. D. (1999a). Bayesian multiscale models for Poisson processes. J. Amer. Statist. Assoc. 94 920--933.
  • Kolaczyk, E. D. (1999b). Some observations on the tractability of certain multi-scale models. In Bayesian Inference in Wavelet-Based Models. Lecture Notes in Statist. 141 51--66. Springer, New York.
  • Kolaczyk, E. D. and Huang, H. (2001). Multiscale statistical models for hierarchical spatial aggregation. Geographical Analysis 33 95--118.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Univ. Press.
  • Li, Q. J. (1999). Estimation of mixture models. Ph.D. dissertation, Dept. Statistics, Yale Univ.
  • Li, Q. J. and Barron, A. R. (2000). Mixture density estimation. In Advances in Neural Information Processing Systems 12 279--285. MIT Press, Cambridge, MA.
  • Nowak, R. D. (1999). Multiscale hidden Markov models for Bayesian image analysis. In Bayesian Inference in Wavelet-Based Models. Lecture Notes in Statist. 141 243--265. Springer, New York.
  • Nowak, R. D. and Kolaczyk, E. D. (2000). A statistical multiscale framework for Poisson inverse problems. IEEE Trans. Inform. Theory 46 1811--1825.
  • Sweldens, W. (1998). The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal. 29 511--546.
  • Timmermann, K. E. and Nowak, R. D. (1999). Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging. IEEE Trans. Inform. Theory 45 846--862.
  • Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York.