Annals of Statistics

Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data

Lynna Chu and Hao Chen

Full-text: Open access

Abstract

We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution-free under the null hypothesis of no change. Analytic $p$-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.

Article information

Source
Ann. Statist., Volume 47, Number 1 (2019), 382-414.

Dates
Received: June 2017
Revised: February 2018
First available in Project Euclid: 30 November 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1543568592

Digital Object Identifier
doi:10.1214/18-AOS1691

Mathematical Reviews number (MathSciNet)
MR3910545

Zentralblatt MATH identifier
07036205

Subjects
Primary: 62G32: Statistics of extreme values; tail inference
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Keywords
Change-point graph-based tests nonparametric scan statistic tail probability high-dimensional data network data non-Euclidean data

Citation

Chu, Lynna; Chen, Hao. Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data. Ann. Statist. 47 (2019), no. 1, 382--414. doi:10.1214/18-AOS1691. https://projecteuclid.org/euclid.aos/1543568592


Export citation

References

  • Carlstein, E., Müller, H.-G. and Siegmund, D., eds. (1994). Change-Point Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 23. IMS, Hayward, CA. Papers from the AMS-IMS-SIAM Summer Research Conference held at Mt. Holyoke College, South Hadley, MA, July 11–16, 1992.
  • Chen, H., Chen, X. and Su, Y. (2017). A weighted edge-count two-sample test for multivariate and object data. J. Amer. Statist. Assoc. 112. To appear. DOI:10.1080/01621459.2017.1307757.
  • Chen, H. and Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and object data. J. Amer. Statist. Assoc. 112 397–409.
  • Chen, J. and Gupta, A. K. (2012). Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, 2nd ed. Birkhäuser/Springer, New York.
  • Chen, L. H. and Shao, Q.-M. (1994). Stein’s Method for Normal Approximation. In An Introduction to Stein’s Method. Lecture Notes Series 4 1–59. World Scientific, Singapore.
  • Chen, H. and Zhang, N. (2015). Graph-based change-point detection. Ann. Statist. 43 139–176.
  • Chu, L. and Chen, H. (2019). Supplement to “Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data.” DOI:10.1214/18-AOS1691SUPP.
  • Csörgő, M. and Horváth, L. (1997). Limit Theorems in Change-Point Analysis. Wiley, Chichester.
  • Cule, M., Samworth, R. and Stewart, M. (2010). Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 545–607.
  • Desobry, F., Davy, M. and Doncarli, C. (2005). An online kernel change detection algorithm. IEEE Trans. Signal Process. 53 2961–2974.
  • Friedman, J. H. and Rafsky, L. C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7 697–717.
  • Heard, N. A., Weston, D. J., Platanioti, K. and Hand, D. J. (2010). Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4 645–662.
  • Jirak, M. (2015). Uniform change point tests in high dimension. Ann. Statist. 43 2451–2483.
  • Kossinets, G. and Watts, D. J. (2006). Empirical analysis of an evolving social network. Science 311 88–90.
  • Lung-Yut-Fong, A., Lévy-Leduc, C. and Cappé, O. (2015). Homogeneity and change-point detection tests for multivariate data using rank statistics. J. SFdS 156 133–162.
  • Matteson, D. S. and James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc. 109 334–345.
  • Park, Y., Wang, H., Nöbauer, T., Vaziri, A. and Priebe, C. E. (2015). Anomaly detection on whole-brain functional imaging of neuronal activity using graph scan statistics. In ACM Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Definition, Detection, and Description (ODDx3).
  • Siegmund, D. and Yakir, B. (2007). The Statistics of Gene Mapping. Statistics for Biology and Health. Springer, New York.
  • Wang, H., Tang, M., Park, Y. and Priebe, C. E. (2014). Locality statistics for anomaly detection in time series of graphs. IEEE Trans. Signal Process. 62 703–717.
  • Xie, Y.and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670–692.
  • Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.

Supplemental materials

  • Supplement to “Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data”. The supplementary material contains the new test statistics for the changed-interval alternative, additional technical results and proofs, more illustrations of the data, additional power and analytical critical value tables and further discussion on the conditions of the graph and the relationship between the new statistics, including an extension of the max-type statistic.