## The Annals of Applied Statistics

### Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data

#### Abstract

Phenomena such as air pollution levels are of greatest interest when observations are large, but standard prediction methods are not specifically designed for large observations. We propose a method, rooted in extreme value theory, which approximates the conditional distribution of an unobserved component of a random vector given large observed values. Specifically, for $\mathbf{Z}=(Z_{1},\ldots,Z_{d})^{T}$ and $\mathbf{Z}_{-d}=(Z_{1},\ldots,Z_{d-1})^{T}$, the method approximates the conditional distribution of $[Z_{d}|\mathbf{Z}_{-d}=\mathbf{z}_{-d}]$ when $\|\mathbf{z}_{-d}\|>r_{*}$. The approach is based on the assumption that $\mathbf{Z}$ is a multivariate regularly varying random vector of dimension $d$. The conditional distribution approximation relies on knowledge of the angular measure of $\mathbf{Z}$, which provides explicit structure for dependence in the distribution’s tail. As the method produces a predictive distribution rather than just a point predictor, one can answer any question posed about the quantity being predicted, and, in particular, one can assess how well the extreme behavior is represented.

Using a fitted model for the angular measure, we apply our method to nitrogen dioxide measurements in metropolitan Washington DC. We obtain a predictive distribution for the air pollutant at a location given the air pollutant’s measurements at four nearby locations and given that the norm of the vector of the observed measurements is large.

#### Article information

Source
Ann. Appl. Stat., Volume 6, Number 4 (2012), 1406-1429.

Dates
First available in Project Euclid: 27 December 2012

https://projecteuclid.org/euclid.aoas/1356629045

Digital Object Identifier
doi:10.1214/12-AOAS554

Mathematical Reviews number (MathSciNet)
MR3058669

Zentralblatt MATH identifier
1257.62118

#### Citation

Cooley, Daniel; Davis, Richard A.; Naveau, Philippe. Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data. Ann. Appl. Stat. 6 (2012), no. 4, 1406--1429. doi:10.1214/12-AOAS554. https://projecteuclid.org/euclid.aoas/1356629045

#### References

• Ballani, F. and Schlather, M. (2011). A construction principle for multivariate extreme value distributions. Biometrika 98 633–645.
• Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., Waal, D. D. and Ferro, C. (2004). Statistics of Extremes: Theory and Applications. Wiley, New York.
• Boldi, M. O. and Davison, A. C. (2007). A mixture model for multivariate extremes. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 217–229.
• Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.
• Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. J. Roy. Statist. Soc. Ser. B 53 377–392.
• Cooley, D., Davis, R. A. and Naveau, P. (2010). The pairwise beta distribution: A flexible parametric multivariate model for extremes. J. Multivariate Anal. 101 2103–2117.
• Craigmile, P. F., Cressie, N., Santner, T. J. and Rao, Y. (2006). A loss function approach to identifying environmental exceedances. Extremes 8 143–159.
• Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
• Davis, R. A. and Resnick, S. I. (1989). Basic properties and prediction of max-ARMA processes. Adv. in Appl. Probab. 21 781–803.
• Davis, R. A. and Resnick, S. I. (1993). Prediction of stationary max-stable processes. Ann. Appl. Probab. 3 497–525.
• de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer, New York.
• EPA. (2010). Fact sheet: Final revisions to the national ambient air quality standards for nitrogen dioxide. Available at http://www.epa.gov/air/nitrogenoxides/pdfs/20100122fs.pdf.
• Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution of the larges or smallest members of a sample. Math. Proc. Cambridge Philos. Soc. 24 180–190.
• Friederichs, P. and Hense, A. (2007). Statistical downscaling of extreme precipitation events using censored quantile regression. Monthly Weather Review 135 2365–2378.
• Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une série aléatoire. Ann. of Math. (2) 44 423–453.
• Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 243–268.
• Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
• Gneiting, T. and Ranjan, R. (2011). Comparing density forecasts using threshold- and quantile-weighted scoring rules. J. Bus. Econom. Statist. 29 411–422.
• Gumbel, É. J. (1960). Distributions des valeurs extrêmes en plusieurs dimensions. Publ. Inst. Statist. Univ. Paris 9 171–173.
• Hogg, R., McKean, J. and Craig, A. (2005). Introduction to Mathematical Statistics, 6th ed. Prentice Hall, Upper Saddle River, NJ.
• Joe, H. (1990). Families of min-stable multivariate exponential and multivariate extreme value distributions. Statist. Probab. Lett. 9 75–81.
• Meyer, M. C. (2008). Inference using shape-restricted regression splines. Ann. Appl. Stat. 2 1013–1033.
• Resnick, S. I. (1987). Extreme Values, Regular Variation, and Point Processes. Springer, New York.
• Resnick, S. (2002). Hidden regular variation, second order regular variation and asymptotic independence. Extremes 5 303–336.
• Resnick, S. I. (2007). Heavy-Tail Phenomena: Probabilistic And Statistical Modeling. Springer, New York.
• Rootzén, H. and Tajvidi, N. (2006). Multivariate generalized Pareto distributions. Bernoulli 12 917–930.
• Schabenberger, O. and Gotway, C. A. (2005). Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC, Boca Raton, FL.
• Song, D. and Gupta, A. K. (1997). $L_{p}$-norm uniform distribution. Proc. Amer. Math. Soc. 125 595–601.
• Stephenson, A. G. (2002). evd: Extreme value distributions. R News 2 31–32.
• Tawn, J. (1990). Modeling multivariate extreme value distributions. Biometrika 75 245–253.
• Wang, Y. and Stoev, S. A. (2011). Conditional sampling for spectrally discrete max-stable random fields. Adv. in Appl. Probab. 43 461–483.
• Wilks, D. (2006). Statistical Methods in the Atmospheric Sciences: An Introduction, 2nd ed. Academic Press, San Diego.