The Annals of Applied Statistics

Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data

Daniel Cooley, Richard A. Davis, and Philippe Naveau

Full-text: Open access

Abstract

Phenomena such as air pollution levels are of greatest interest when observations are large, but standard prediction methods are not specifically designed for large observations. We propose a method, rooted in extreme value theory, which approximates the conditional distribution of an unobserved component of a random vector given large observed values. Specifically, for $\mathbf{Z}=(Z_{1},\ldots,Z_{d})^{T}$ and $\mathbf{Z}_{-d}=(Z_{1},\ldots,Z_{d-1})^{T}$, the method approximates the conditional distribution of $[Z_{d}|\mathbf{Z}_{-d}=\mathbf{z}_{-d}]$ when $\|\mathbf{z}_{-d}\|>r_{*}$. The approach is based on the assumption that $\mathbf{Z}$ is a multivariate regularly varying random vector of dimension $d$. The conditional distribution approximation relies on knowledge of the angular measure of $\mathbf{Z}$, which provides explicit structure for dependence in the distribution’s tail. As the method produces a predictive distribution rather than just a point predictor, one can answer any question posed about the quantity being predicted, and, in particular, one can assess how well the extreme behavior is represented.

Using a fitted model for the angular measure, we apply our method to nitrogen dioxide measurements in metropolitan Washington DC. We obtain a predictive distribution for the air pollutant at a location given the air pollutant’s measurements at four nearby locations and given that the norm of the vector of the observed measurements is large.

Article information

Source
Ann. Appl. Stat., Volume 6, Number 4 (2012), 1406-1429.

Dates
First available in Project Euclid: 27 December 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1356629045

Digital Object Identifier
doi:10.1214/12-AOAS554

Mathematical Reviews number (MathSciNet)
MR3058669

Zentralblatt MATH identifier
1257.62118

Keywords
Multivariate regular variation threshold exceedances angular or spectral measure air pollution nitrogen dioxide monitoring

Citation

Cooley, Daniel; Davis, Richard A.; Naveau, Philippe. Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data. Ann. Appl. Stat. 6 (2012), no. 4, 1406--1429. doi:10.1214/12-AOAS554. https://projecteuclid.org/euclid.aoas/1356629045


Export citation

References

  • Ballani, F. and Schlather, M. (2011). A construction principle for multivariate extreme value distributions. Biometrika 98 633–645.
  • Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., Waal, D. D. and Ferro, C. (2004). Statistics of Extremes: Theory and Applications. Wiley, New York.
  • Boldi, M. O. and Davison, A. C. (2007). A mixture model for multivariate extremes. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 217–229.
  • Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.
  • Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. J. Roy. Statist. Soc. Ser. B 53 377–392.
  • Cooley, D., Davis, R. A. and Naveau, P. (2010). The pairwise beta distribution: A flexible parametric multivariate model for extremes. J. Multivariate Anal. 101 2103–2117.
  • Craigmile, P. F., Cressie, N., Santner, T. J. and Rao, Y. (2006). A loss function approach to identifying environmental exceedances. Extremes 8 143–159.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
  • Davis, R. A. and Resnick, S. I. (1989). Basic properties and prediction of max-ARMA processes. Adv. in Appl. Probab. 21 781–803.
  • Davis, R. A. and Resnick, S. I. (1993). Prediction of stationary max-stable processes. Ann. Appl. Probab. 3 497–525.
  • de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer, New York.
  • EPA. (2010). Fact sheet: Final revisions to the national ambient air quality standards for nitrogen dioxide. Available at http://www.epa.gov/air/nitrogenoxides/pdfs/20100122fs.pdf.
  • Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution of the larges or smallest members of a sample. Math. Proc. Cambridge Philos. Soc. 24 180–190.
  • Friederichs, P. and Hense, A. (2007). Statistical downscaling of extreme precipitation events using censored quantile regression. Monthly Weather Review 135 2365–2378.
  • Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une série aléatoire. Ann. of Math. (2) 44 423–453.
  • Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 243–268.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Gneiting, T. and Ranjan, R. (2011). Comparing density forecasts using threshold- and quantile-weighted scoring rules. J. Bus. Econom. Statist. 29 411–422.
  • Gumbel, É. J. (1960). Distributions des valeurs extrêmes en plusieurs dimensions. Publ. Inst. Statist. Univ. Paris 9 171–173.
  • Hogg, R., McKean, J. and Craig, A. (2005). Introduction to Mathematical Statistics, 6th ed. Prentice Hall, Upper Saddle River, NJ.
  • Joe, H. (1990). Families of min-stable multivariate exponential and multivariate extreme value distributions. Statist. Probab. Lett. 9 75–81.
  • Meyer, M. C. (2008). Inference using shape-restricted regression splines. Ann. Appl. Stat. 2 1013–1033.
  • Resnick, S. I. (1987). Extreme Values, Regular Variation, and Point Processes. Springer, New York.
  • Resnick, S. (2002). Hidden regular variation, second order regular variation and asymptotic independence. Extremes 5 303–336.
  • Resnick, S. I. (2007). Heavy-Tail Phenomena: Probabilistic And Statistical Modeling. Springer, New York.
  • Rootzén, H. and Tajvidi, N. (2006). Multivariate generalized Pareto distributions. Bernoulli 12 917–930.
  • Schabenberger, O. and Gotway, C. A. (2005). Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC, Boca Raton, FL.
  • Song, D. and Gupta, A. K. (1997). $L_{p}$-norm uniform distribution. Proc. Amer. Math. Soc. 125 595–601.
  • Stephenson, A. G. (2002). evd: Extreme value distributions. R News 2 31–32.
  • Tawn, J. (1990). Modeling multivariate extreme value distributions. Biometrika 75 245–253.
  • Wang, Y. and Stoev, S. A. (2011). Conditional sampling for spectrally discrete max-stable random fields. Adv. in Appl. Probab. 43 461–483.
  • Wilks, D. (2006). Statistical Methods in the Atmospheric Sciences: An Introduction, 2nd ed. Academic Press, San Diego.