Missouri Journal of Mathematical Sciences

Identifying Outlying Observations in Regression Trees

Nicholas Granered and Samantha C. Bates Prins

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Regression trees are an alternative to classical linear regression models that seek to fit a piecewise linear model to data. The structure of regression trees makes them well-suited to the modeling of data containing outliers. We propose an algorithm that takes advantage of this feature in order to automatically detect outliers. This new algorithm performs well on the four test datasets [7] that are considered to be necessary for a valid outlier detection algorithm in a linear regression context, even though regression trees lack the global linearity assumption. We also show the practical use of this approach in detecting outliers in an ecological dataset collected in the Shenandoah Valley.

Article information

Source
Missouri J. Math. Sci., Volume 28, Issue 1 (2016), 76-87.

Dates
First available in Project Euclid: 19 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.mjms/1474295357

Digital Object Identifier
doi:10.35834/mjms/1474295357

Mathematical Reviews number (MathSciNet)
MR3549809

Zentralblatt MATH identifier
1348.62134

Subjects
Primary: 62G08: Nonparametric regression

Keywords
outlier detection influential observations backward-stepping robust models outlier CART

Citation

Granered, Nicholas; Bates Prins, Samantha C. Identifying Outlying Observations in Regression Trees. Missouri J. Math. Sci. 28 (2016), no. 1, 76--87. doi:10.35834/mjms/1474295357. https://projecteuclid.org/euclid.mjms/1474295357


Export citation

References

  • R. Chambers, A. Hentges, and X. Zhao, Robust automatic methods for outlier and error detection, Journal of the Royal Statistical Society A, 167.2 (2006), 323–339.
  • N. Cheze and J.-M. Poggi, Iterated Boosting for Outlier Detection, Data Science and Classification, Springer, Berlin, 2006, pp. 213–220.
  • J. Coleman et al., Equality of Educational Opportunity, Office of Education, U.S. Department of Health: Washington, D.C., 1966.
  • G. De'ath and K. Fabricius, Classification and regression trees: A powerful yet simple technique for ecological data analysis, Ecology, 81.11 (2000), 3178–3192.
  • J. Elith, J. Leathwick, and T. Hastie, A working guide to boosted regression trees, Journal of Animal Ecology, 77 (2008), 802–813.
  • Friends of the Shenandoah River, Monitoring Data Files - FOSR, 2013, http://fosr.org/state-of-the-river/monitoring-data-files/.
  • A. Hadi and J. Simonoff, Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88.424 (1993), 1264–1272.
  • S. Psarakis and J. Panaretos, The folded t distribution, Communications in Statistics-Theory and Methods, 19.7 (1990), 2717–2734.
  • R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2012, http://www.R-project.org/.
  • A. Rahmatulla Imon, Identifying multiple influential observations in linear regression, Journal of Applied Statistics, 32.9 (2006), 929–946.
  • B. Ripley, TREE: Classification and Regression Trees, R package version 1.0-35, 2014.
  • P. Rousseuw and A. Leroy, Robust Regression and Outlier Detection, John Wiley, New York, 1987.
  • J. Simonoff, General approaches to stepwise identification of unusual values in data analysis, Directions in Robust Statistics and Diagnostics, Springer, New York, 1991, pp. 223–242.
  • K. Zirkle, Predicting water quality in the Shenandoah Valley, Senior Honor's Thesis, James Madison University, Harrisonburg, VA, 2013.