## The Annals of Statistics

### Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs

#### Abstract

We study recovery of piecewise-constant signals on graphs by the estimator minimizing an $l_{0}$-edge-penalized objective. Although exact minimization of this objective may be computationally intractable, we show that the same statistical risk guarantees are achieved by the $\alpha$-expansion algorithm which computes an approximate minimizer in polynomial time. We establish that for graphs with small average vertex degree, these guarantees are minimax rate-optimal over classes of edge-sparse signals. For spatially inhomogeneous graphs, we propose minimization of an edge-weighted objective where each edge is weighted by its effective resistance or another measure of its contribution to the graph’s connectivity. We establish minimax optimality of the resulting estimators over corresponding edge-weighted sparsity classes. We show theoretically that these risk guarantees are not always achieved by the estimator minimizing the $l_{1}$/total-variation relaxation, and empirically that the $l_{0}$-based estimates are more accurate in high signal-to-noise settings.

#### Article information

Source
Ann. Statist., Volume 46, Number 6B (2018), 3217-3245.

Dates
Revised: September 2017
First available in Project Euclid: 11 September 2018

https://projecteuclid.org/euclid.aos/1536631272

Digital Object Identifier
doi:10.1214/17-AOS1656

Mathematical Reviews number (MathSciNet)
MR3852650

Zentralblatt MATH identifier
06965686

Subjects
Primary: 62G05: Estimation

#### Citation

Fan, Zhou; Guan, Leying. Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs. Ann. Statist. 46 (2018), no. 6B, 3217--3245. doi:10.1214/17-AOS1656. https://projecteuclid.org/euclid.aos/1536631272

#### References

• [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• [2] Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• [3] Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
• [4] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
• [5] Arias-Castro, E. and Grimmett, G. R. (2013). Cluster detection in networks using percolation. Bernoulli 19 676–719.
• [6] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 51 39–54.
• [7] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
• [8] Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Amer. Statist. Assoc. 88 309–319.
• [9] Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B 48 259–302.
• [10] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [11] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
• [12] Boykov, Y. and Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26 1124–1137.
• [13] Boykov, Y., Veksler, O. and Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23 1222–1239.
• [14] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist. 37 157–183.
• [15] Chambolle, A. (2005). Total variation minimization and a class of binary MRF models. In EMMCVPR 2005 136–152. Springer, Berlin.
• [16] Chambolle, A. and Lions, P.-L. (1997). Image recovery via total variation minimization and related problems. Numer. Math. 76 167–188.
• [17] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
• [18] Chernoff, H. and Zacks, S. (1964). Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat. 35 999–1018.
• [19] Dalalyan, A. S., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
• [20] Darbon, J. and Sigelle, M. (2005). A fast and exact algorithm for total variation minimization. In Iberian Conference on Pattern Recognition and Image Analysis 351–359. Springer, Berlin.
• [21] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29 1–65.
• [22] Donoho, D. L. (1999). Wedgelets: Nearly minimax estimation of edges. Ann. Statist. 27 859–897.
• [23] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
• [24] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
• [25] Fan, Z. and Guan, L. (2018). Supplement to “Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs.” DOI:10.1214/17-AOS1656SUPP.
• [26] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6 721–741.
• [27] Ghosh, A., Boyd, S. and Saberi, A. (2008). Minimizing effective resistance of a graph. SIAM Rev. 50 37–66.
• [28] Goldstein, T. and Osher, S. (2009). The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2 323–343.
• [29] Greig, D. M., Porteous, B. T. and Seheult, A. H. (1989). Exact maximum a posteriori estimation for binary images. J. R. Stat. Soc. Ser. B. Stat. Methodol. 51 271–279.
• [30] Guntuboyina, A., Lieu, D., Chatterjee, S. and Sen, B. (2017). Spatial adaptation in trend filtering. Available at arXiv:1702.05113.
• [31] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480–1493.
• [32] Harris, X. T. (2016). Prediction error after model search. Available at arXiv:1610.06107.
• [33] Hoefling, H. (2010). A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Statist. 19 984–1006. Supplementary materials available online.
• [34] Hütter, J.-C. and Rigollet, P. (2016). Optimal rates for total variation denoising. In Conf. Learning Theory 1115–1146.
• [35] Jackson, B., Scargle, J. D. et al. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett. 12 105–108.
• [36] Johnstone, I. (2015). Gaussian Estimation: Sequence and Wavelet Models. Available at statweb.stanford.edu/~imj/GE09-08-15.pdf.
• [37] Karger, D. R. and Stein, C. (1996). A new approach to the minimum cut problem. J. ACM 43 601–640.
• [38] Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107 1590–1598.
• [39] Kolmogorov, V. and Zabin, R. (2004). What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26 147–159.
• [40] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statistics 82. Springer, New York.
• [41] Kovac, A. and Smith, A. D. (2011). Nonparametric regression on a graph. J. Comput. Graph. Statist. 20 432–447.
• [42] Land, S. R. and Friedman, J. H. (1997). Variable fusion: A new adaptive signal regression method. Technical Report 656, Dept. Statistics, Carnegie Mellon Univ., Pittsburgh, PA.
• [43] Lebarbier, É. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process. 85 717–736.
• [44] Lin, K., Sharpnack, J., Rinaldo, A. and Tibshirani, R. J. (2016). Approximate recovery in changepoint problems, from $\ell_{2}$ estimation error rates. Available at arXiv:1606.06746.
• [45] Livne, O. E. and Brandt, A. (2012). Lean algebraic multigrid (LAMG): Fast graph Laplacian linear solver. SIAM J. Sci. Comput. 34 B499–B522.
• [46] Lovász, L. (1996). Random walks on graphs: A survey. In Combinatorics: Paul Erdős Is Eighty, Vol. 2 (Keszthely, 1993). Bolyai Soc. Math. Stud. 2 353–397. János Bolyai Math. Soc., Budapest.
• [47] Madrid Padilla, O. H., Scott, J. G., Sharpnack, J. and Tibshirani, R. J. (2016). The DFS fused lasso: Nearly optimal linear-time denoising over graphs and trees. Available at arXiv:1608.03384.
• [48] Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
• [49] Moore, C. and Newman, M. E. (2000). Epidemics and percolation in small-world networks. Phys. Rev. E 61 5678–5682.
• [50] Mumford, D. and Shah, J. (1989). Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42 577–685.
• [51] Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist. 37 2922–2952.
• [52] Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268. Experimental mathematics: Computational issues in nonlinear science (Los Alamos, NM, 1991).
• [53] Sadhanala, V., Wang, Y.-X. and Tibshirani, R. (2016). Graph sparsification approaches for Laplacian smoothing. In Int. Conf. Artific. Intell. Statist. 1250–1259.
• [54] Sadhanala, V., Wang, Y.-X. and Tibshirani, R. J. (2016). Total variation classes beyond 1d: Minimax rates, and the limitations of linear smoothers. In Adv. Neural Inform. Process. Syst. 3513–3521.
• [55] Sharpnack, J., Rinaldo, A. and Singh, A. (2012). Sparsistency of the edge lasso over graphs. In Int. Conf. Artific. Intell. Statist. 1028–1036.
• [56] Sharpnack, J., Singh, A. and Rinaldo, A. (2013). Detecting activations over graphs using spanning tree wavelet bases. In Int. Conf. Artific. Intell. Statist. 545–553.
• [57] Sharpnack, J. L., Krishnamurthy, A. and Singh, A. (2013). Near-optimal anomaly detection in graphs using Lovasz extended scan statistic. In Adv. Neural Inform. Process. Syst. 1959–1967.
• [58] Spielman, D. A. and Srivastava, N. (2011). Graph sparsification by effective resistances. SIAM J. Comput. 40 1913–1926.
• [59] Spielman, D. A. and Teng, S.-H. (2004). Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In ACM Symp. Theory Comput. 81–90. ACM, New York.
• [60] Tansey, W. and Scott, J. G. (2015). A fast and flexible algorithm for the graph-fused lasso. Available at arXiv:1505.06475.
• [61] Tian, X. and Taylor, J. E. (2015). Selective inference with a randomized response. Available at arXiv:1507.06739.
• [62] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• [63] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• [64] Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
• [65] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [66] Wang, Y.-X., Sharpnack, J., Smola, A. J. and Tibshirani, R. J. (2016). Trend filtering on graphs. J. Mach. Learn. Res. 17 Paper No. 105.
• [67] Winkler, G. and Liebscher, V. (2002). Smoothers for discontinuous signals. J. Nonparametr. Stat. 14 203–222.
• [68] Xin, B., Kawahara, Y., Wang, Y. and Gao, W. (2014). Efficient generalized fused lasso and its application to the diagnosis of Alzheimer’s disease. In Proc. Assoc. Adv. Artific. Intell. Conf. 2163–2169.
• [69] Yao, Y.-C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.
• [70] Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statist. Probab. Lett. 6 181–189.
• [71] Yao, Y.-C. and Au, S.-T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370–381.
• [72] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2014). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In Conf. Learning Theory 35 1–28.
• [73] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2017). Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators. Electron. J. Stat. 11 752–799.

#### Supplemental materials

• Supplementary Appendices. The supplementary appendices contain proofs of theoretical results.