The Annals of Applied Statistics

Tree models for difference and change detection in a complex environment

Yong Wang, Ilze Ziedins, Mark Holmes, and Neil Challands

Full-text: Open access

Abstract

A new family of tree models is proposed, which we call “differential trees.” A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment.

Article information

Source
Ann. Appl. Stat., Volume 6, Number 3 (2012), 1162-1184.

Dates
First available in Project Euclid: 31 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1346418578

Digital Object Identifier
doi:10.1214/12-AOAS548

Mathematical Reviews number (MathSciNet)
MR3012525

Zentralblatt MATH identifier
1254.62068

Keywords
Tree models change detection event data $p$-value adjustment arson case study

Citation

Wang, Yong; Ziedins, Ilze; Holmes, Mark; Challands, Neil. Tree models for difference and change detection in a complex environment. Ann. Appl. Stat. 6 (2012), no. 3, 1162--1184. doi:10.1214/12-AOAS548. https://projecteuclid.org/euclid.aoas/1346418578


Export citation

References

  • Basseville, M. and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs, NJ.
  • Breiman, L. (1996a). Bagging predictors. Machine Learning 24 123–140.
  • Breiman, L. (1996b). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
  • Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Chaudhuri, P., Lo, W. D., Loh, W.-Y. and Yang, C. C. (1995). Generalized regression trees. Statist. Sinica 5 641–666.
  • Davis, R. B. and Anderson, J. R. (1989). Exponential survival trees. Stat. Med. 8 947–961.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
  • Gustafsson, F. (2000). Adaptive Filtering and Change Detection. Wiley, Chichester, UK.
  • Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random survival forests. Ann. Appl. Stat. 2 841–860.
  • Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. J. Appl. Stat. 29 119–127.
  • Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems. J. Roy. Statist. Soc. Ser. B 57 613–658.
  • MacEachern, S. N., Rao, Y. and Wu, C. (2007). A robust-likelihood cumulative sum chart. J. Amer. Statist. Assoc. 102 1440–1447.
  • Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58 415–434.
  • Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line. J. Amer. Statist. Assoc. 60 532–538.
  • Page, E. S. (1954). Continuous inspection schemes. Biometrika 41 100–115.
  • Poor, H. V. and Hadjiliadis, O. (2009). Quickest Detection. Cambridge Univ. Press, Cambridge.
  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
  • R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Shewhart, W. A. (1931). Economic Control of Manufactured Products. Van Nostrand-Reinhold, New York.
  • Su, X., Wang, M. and Fan, J. (2004). Maximum likelihood regression trees. J. Comput. Graph. Statist. 13 586–598.
  • Therneau, T. M. and Atkinson, E. J. (1997). An introduction to recursive partitioning using the rpart routine. Technical Report 61, Section of Biostatistics, Mayo Clinic, Rochester, NY.
  • Wang, Y., Ziedins, I., Holmes, M. and Challands, N. (2012). Supplement to “Tree models for difference and change detection in a complex environment”. DOI:10.1214/12-AOAS548SUPPA, DOI:10.1214/12-AOAS548SUPPB.

Supplemental materials