The Annals of Applied Statistics

Outline analyses of the called strike zone in Major League Baseball

Dale L. Zimmerman, Jun Tang, and Rui Huang

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We extend statistical shape analytic methods known as outline analysis for application to the strike zone, a central feature of the game of baseball. Although the strike zone is rigorously defined by Major League Baseball’s official rules, umpires make mistakes in calling pitches as strikes (and balls) and may even adhere to a strike zone somewhat different than that prescribed by the rule book. Our methods yield inference on geometric attributes (centroid, dimensions, orientation and shape) of this “called strike zone” (CSZ) and on the effects that years, umpires, player attributes, game situation factors and their interactions have on those attributes. The methodology consists of first using kernel discriminant analysis to determine a noisy outline representing the CSZ corresponding to each factor combination, then fitting existing elliptic Fourier and new generalized superelliptic models for closed curves to that outline and finally analyzing the fitted model coefficients using standard methods of regression analysis, factorial analysis of variance and variance component estimation. We apply these methods to PITCHf/x data comprising more than three million called pitches from the 2008–2016 Major League Baseball seasons to address numerous questions about the CSZ. We find that all geometric attributes of the CSZ, except its size, became significantly more like those of the rule-book strike zone from 2008–2016 and that several player attribute/game situation factors had statistically and practically significant effects on many of them. We also establish that the variation in the horizontal center, width and area of an individual umpire’s CSZ from pitch to pitch is smaller than their variation among CSZs from different umpires.

Article information

Source
Ann. Appl. Stat., Volume 13, Number 4 (2019), 2416-2451.

Dates
Received: January 2018
Revised: May 2019
First available in Project Euclid: 28 November 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1574910050

Digital Object Identifier
doi:10.1214/19-AOAS1285

Mathematical Reviews number (MathSciNet)
MR4037436

Keywords
Elliptic Fourier model kernel discriminant analysis morphometrics orthogonal distance fitting shape analysis superellipse

Citation

Zimmerman, Dale L.; Tang, Jun; Huang, Rui. Outline analyses of the called strike zone in Major League Baseball. Ann. Appl. Stat. 13 (2019), no. 4, 2416--2451. doi:10.1214/19-AOAS1285. https://projecteuclid.org/euclid.aoas/1574910050


Export citation

References

Supplemental materials

  • Supplement A to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material an animation depicting the evolution of the called strike zone from 2008–2016.
  • Supplement B to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material R code for fitting elliptic Fourier models.
  • Supplement C to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material details of the algorithm for fitting generalized superellipses and other closed curves.
  • Supplement D to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material an R package for fitting generalized superelliptical models.
  • Supplement E to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material displays of KDA-based and fitted ATLAS called strike zones and a list of estimated ATLAS coefficients corresponding to the 96 factor combinations described in Section 6. We also give the MANOVA table, ANOVA tables, and standard errors of weighted level means described in Section 6, and give similar results for a model that includes main effects of year and interactions of all other factors with year. Finally, we list bootstrap variances of ATLAS parameter estimates of called strike zones for selected factor combinations.
  • Supplement F to “Outline analyses of the called strike zone in Major League Baseball”. We include in this supplementary material a derivation of the form of the confidence interal for the proportion of variability in geometric attributes of called strike zone outlines attributable to umpires.