Abstract
Phenomena such as air pollution levels are of greatest interest when observations are large, but standard prediction methods are not specifically designed for large observations. We propose a method, rooted in extreme value theory, which approximates the conditional distribution of an unobserved component of a random vector given large observed values. Specifically, for $\mathbf{Z}=(Z_{1},\ldots,Z_{d})^{T}$ and $\mathbf{Z}_{-d}=(Z_{1},\ldots,Z_{d-1})^{T}$, the method approximates the conditional distribution of $[Z_{d}|\mathbf{Z}_{-d}=\mathbf{z}_{-d}]$ when $\|\mathbf{z}_{-d}\|>r_{*}$. The approach is based on the assumption that $\mathbf{Z}$ is a multivariate regularly varying random vector of dimension $d$. The conditional distribution approximation relies on knowledge of the angular measure of $\mathbf{Z}$, which provides explicit structure for dependence in the distribution’s tail. As the method produces a predictive distribution rather than just a point predictor, one can answer any question posed about the quantity being predicted, and, in particular, one can assess how well the extreme behavior is represented.
Using a fitted model for the angular measure, we apply our method to nitrogen dioxide measurements in metropolitan Washington DC. We obtain a predictive distribution for the air pollutant at a location given the air pollutant’s measurements at four nearby locations and given that the norm of the vector of the observed measurements is large.
Citation
Daniel Cooley. Richard A. Davis. Philippe Naveau. "Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data." Ann. Appl. Stat. 6 (4) 1406 - 1429, December 2012. https://doi.org/10.1214/12-AOAS554
Information