The Annals of Statistics

Consistencies and rates of convergence of jump-penalized least squares estimators

Leif Boysen, Angela Kempe, Volkmar Liebscher, Axel Munk, and Olaf Wittich

Full-text: Open access


We study the asymptotics for jump-penalized least squares regression aiming at approximating a regression function by piecewise constant functions. Besides conventional consistency and convergence rates of the estimates in L2([0, 1)) our results cover other metrics like Skorokhod metric on the space of càdlàg functions and uniform metrics on C([0, 1]). We will show that these estimators are in an adaptive sense rate optimal over certain classes of “approximation spaces.” Special cases are the class of functions of bounded variation (piecewise) Hölder continuous functions of order 0<α≤1 and the class of step functions with a finite but arbitrary number of jumps. In the latter setting, we will also deduce the rates known from change-point analysis for detecting the jumps. Finally, the issue of fully automatic selection of the smoothing parameter is addressed.

Article information

Ann. Statist., Volume 37, Number 1 (2009), 157-183.

First available in Project Euclid: 16 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62G20: Asymptotic properties
Secondary: 41A10: Approximation by polynomials {For approximation by trigonometric polynomials, see 42A10} 41A25: Rate of convergence, degree of approximation

Jump detection adaptive estimation penalized maximum likelihood approximation spaces change-point analysis multiscale resolution analysis Potts functional nonparametric regression regressogram Skorokhod topology variable selection


Boysen, Leif; Kempe, Angela; Liebscher, Volkmar; Munk, Axel; Wittich, Olaf. Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist. 37 (2009), no. 1, 157--183. doi:10.1214/07-AOS558.

Export citation


  • Aurich, V. and Weule, J. (1995). Nonlinear Gaussian filters performing edge preserving diffusion. In Proc. 17. DAGM-Symposium, Bielefeld 538–545. Springer, Berlin.
  • Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
  • Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
  • Blake, A. and Zisserman, A. (1987). Visual Reconstruction. MIT Press, Cambridge, MA.
  • Boysen, L., Liebscher, V., Munk, A. and Wittich, O. (2007). Scale space consistency of piecewise constant least squares estimators—another look at the regressogram. IMS Lecture Notes Monograph Ser. 55 65–84. IMS, Beachwood, OH.
  • Braun, J. V., Braun, R. K. and Müller, H.-G. (2000). Multiple change-point fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87 301–314.
  • Burchard, H. G. and Hale, D. F. (1975). Piecewise polynomial approximation on optimal meshes. J. Approximation Theory 14 128–147.
  • Chaudhuri, P. and Marron, J. S. (2000). Scale space view of curve estimation. Ann. Statist. 28 408–428.
  • Christensen, J. and Rudemo, M. (1996). Multiple change-point analysis of disease incidence rates. Prev. Vet. Med. 54–76.
  • Chu, C., Glad, I., Godtliebsen, F. and Marron, J. (1998). Edge-preserving smoothers for image processing. J. Amer. Statist. Assoc. 93 526–541.
  • Dal Maso, G. (1993). An Introduction to Γ-convergence. Birkhäuser, Boston.
  • Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29 1–65.
  • DeVore, R. A. (1998). Nonlinear approximation. In Acta Numerica 1998. Acta Numer. 7 51–150. Cambridge Univ. Press, Cambridge.
  • DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer, Berlin.
  • Donoho, D. (2006a). For most large underdetermined systems of equations, the minimal 1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
  • Donoho, D. (2006b). For most large underdetermined systems of equations, the minimal 1-norm solution is the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
  • Donoho, D. L. (1997). CART and best-ortho-basis: A connection. Ann. Statist. 25 1870–1911.
  • Donoho, D. L. (1999). Wedgelets: Nearly minimax estimation of edges. Ann. Statist. 27 859–897.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301–369.
  • Eubank, R. L. (1999). Nonparametric Regression and Spline Smoothing, 2nd ed. Dekker, New York.
  • Fredkin, D. and Rice, J. (1992). Baysian restoration and single-channel patch clamp recordings. Biometrics 48 427–428.
  • Friedrich, F. (2005). Complexity penalized segmentations in 2D. Ph.D. thesis, Institut für Biomathematik und Biometrie an der Gesellschaft für Umwelt und Gesundheit, München-Neuherberg.
  • Friedrich, F., Kempe, A., Liebscher, V. and Winkler, G. (2008). Complexity penalized m-estimation: Fast computation. J. Comput. Graph. Statist. 17 1–24.
  • Führ, H., Demaret, L. and Friedrich, F. (2006). Beyond wavelets: New image representation paradigms. In Document and Image Compression (M. Barni, ed.) Chapter 7 179–206. CRC Press.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6 721–741.
  • Godtliebsen, F., Spjøtvoll, E. and Marron, J. S. (1997). A nonlinear Gaussian filter applied to images with discontinuities. J. Nonparametr. Statist. 8 21–43.
  • Hall, P. and Titterington, D. M. (1992). Edge-preserving and peak-preserving smoothing. Technometrics 34 429–440.
  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics. Wiley, New York.
  • Hess, C. (1996). Epi-convergence of sequences of normal integrands and strong consistency of the maximum likelihood estimator. Ann. Statist. 24 1298–1315.
  • Hinkley, D. V. (1970). Inference about the change-point in a sequence of random variables. Biometrika 57 1–17.
  • Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Z. Phys. 31 253.
  • Kohler, M. (1999). Nonparametric estimation of piecewise smooth regression functions. Statist. Probab. Lett. 43 49–55.
  • Künsch, H. R. (1994). Robust priors for smoothing and image restoration. Ann. Inst. Statist. Math. 46 1–19.
  • Loader, C. R. (1996). Change point estimation using nonparametric regression. Ann. Statist. 24 1667–1678.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
  • Müller, H.-G. (1992). Change-points in nonparametric regression analysis. Ann. Statist. 20 737–761.
  • Müller, H.-G. and Stadtmüller, U. (1999). Discontinuous versus smooth regression. Ann. Statist. 27 299–337.
  • Petrov, V. V. (1975). Sums of Independent Random Variables. Springer, New York.
  • Polzehl, J. and Spokoiny, V. (2003). Image denoising: Pointwise adaptive approach. Ann. Statist. 31 30–57.
  • Pötscher, B. and Leeb, H. (2008). Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics 142 201–211.
  • Potts, R. (1952). Some generalized order-disorder transitions. Proc. Camb. Philos. Soc. 48 106–109.
  • Shao, Q. M. (1995). On a conjecture of Révész. Proc. Amer. Math. Soc. 123 575–582.
  • Spokoiny, V. G. (1998). Estimation of a function with discontinuities via local polynomial fit with an adaptive window choice. Ann. Statist. 26 1356–1378.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tomkins, R. J. (1974). On the law of the iterated logarithm for double sequences of random variables. Z. Wahrsch. Verw. Gebiete 30 303–314.
  • Tukey, J. W. (1961). Curves as parameters, and touch estimation. Proc. 4th Berkeley Sympos. Math. Statist. and Probab. I 681–694. Univ. California Press, Berkeley.
  • van de Geer, S. (2001). Least squares estimation with complexity penalties. Math. Methods Statist. 10 355–374.
  • Winkler, G. and Liebscher, V. (2002). Smoothers for discontinuous signals. J. Nonparametr. Statist. 14 203–222.
  • Winkler, G., Wittich, O., Liebscher, V. and Kempe, A. (2005). Don’t shed tears over breaks. Jahresber. Deutsch. Math.-Verein. 107 57–87.
  • Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statist. Probab. Lett. 6 181–189.
  • Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370–381.