The Annals of Applied Statistics

Interpreting self-organizing maps through space–time data models

Huiyan Sang, Alan E. Gelfand, Chris Lennard, Gabriele Hegerl, and Bruce Hewitson

Full-text: Open access

Abstract

Self-organizing maps (SOMs) are a technique that has been used with high-dimensional data vectors to develop an archetypal set of states (nodes) that span, in some sense, the high-dimensional space. Noteworthy applications include weather states as described by weather variables over a region and speech patterns as characterized by frequencies in time. The SOM approach is essentially a neural network model that implements a nonlinear projection from a high-dimensional input space to a low-dimensional array of neurons. In the process, it also becomes a clustering technique, assigning to any vector in the high-dimensional data space the node (neuron) to which it is closest (using, say, Euclidean distance) in the data space. The number of nodes is thus equal to the number of clusters. However, the primary use for the SOM is as a representation technique, that is, finding a set of nodes which representatively span the high-dimensional space. These nodes are typically displayed using maps to enable visualization of the continuum of the data space. The technique does not appear to have been discussed in the statistics literature so it is our intent here to bring it to the attention of the community. The technique is implemented algorithmically through a training set of vectors. However, through the introduction of stochasticity in the form of a space–time process model, we seek to illuminate and interpret its performance in the context of application to daily data collection. That is, the observed daily state vectors are viewed as a time series of multivariate process realizations which we try to understand under the dimension reduction achieved by the SOM procedure.

The application we focus on here is to synoptic climatology where the goal is to develop an array of atmospheric states to capture a collection of distinct circulation patterns. In particular, we have daily weather data observed in the form of 11 variables measured for each of 77 grid cells yielding an 847×1 vector for each day. We have such daily vectors for a period of 31 years (11,315 days). Twelve SOM nodes have been obtained by the meteorologists to represent the space of these data vectors. Again, we try to enhance our understanding of dynamic SOM node behavior arising from this dataset.

Article information

Source
Ann. Appl. Stat., Volume 2, Number 4 (2008), 1194-1216.

Dates
First available in Project Euclid: 8 January 2009

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1231424206

Digital Object Identifier
doi:10.1214/08-AOAS174

Mathematical Reviews number (MathSciNet)
MR2655655

Zentralblatt MATH identifier
05505351

Keywords
Bivariate spatial predictive process space–time models Markov chain Monte Carlo model choice vector autoregressive model

Citation

Sang, Huiyan; Gelfand, Alan E.; Lennard, Chris; Hegerl, Gabriele; Hewitson, Bruce. Interpreting self-organizing maps through space–time data models. Ann. Appl. Stat. 2 (2008), no. 4, 1194--1216. doi:10.1214/08-AOAS174. https://projecteuclid.org/euclid.aoas/1231424206


Export citation

References

  • Banerjee, S., Gelfand, A., Finley, A. and Sang, H. (2008). Gaussian predictive process models for large spatial datasets. J. Roy. Statist. Soc. Ser. B. To appear.
  • Bellone, E., Hughes, J. and Guttorp, P. (2000). A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts. Clim. Res. 15 1–12.
  • Cavazos, T. and Hewitson, B. (2005). Performance of NCEP–NCAR reanalysis variables in statistical downscaling of daily precipitation. Clim. Res. 28 95–107.
  • Crane, R. and Hewitson, B. (2003). Clustering and upscaling of station precipitation records to regional patterns using self-organizing maps (SOMs). Clim. Res. 25 95–107.
  • Enders, W. (2003). Applied Econometric Time Series, 2nd ed. Wiley, New York.
  • Ferrandez, J. M., del Valle, D., Rodellar, V. and Gomez, P. (1997). An automatic speech recognition system using time-delays self-organizing maps with physiological parametric extraction. Acoustical Society of America J. 102 3165.
  • Gelfand, A., Schmidt, A., Banerjee, S. and Sirmans, C. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion). Test 13 263–312.
  • Hewitson, B. and Crane, R. (2002). Self-organizing maps: Applications to synoptic climatology. Climate Research 22 13–26.
  • Hughes, J., Guttorp, P. and Charles, S. (1999). A nonhomogeneous hidden Markov model for precipitation occurrence. Appl. Statist. 48 15–30.
  • Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J. et al. (1996). The NCEP/NCAR reanalysis 40-year project. Bull. Am. Meteorol. Soc. 77 437–471.
  • Kaski, S. (1997). Data exploration using self-organizing maps. Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series 82 57.
  • Kohonen, T. (1995). Self-Organising Maps. Springer Series in Information Sciences 30. Springer, Berlin.
  • Kohonen, T., Oja, E., Simula, O., Visa, A. and Kangas, J. (1996). Engineering applications of the self-organizing map. Proc. IEEE 84 1358–1384.
  • Lütkepohl, H. (1993). Introduction to Multiple Time Series Analysis, 2nd ed. Springer, Berlin.
  • Sammon, J. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18 401–409.
  • Sims, C. (1972). Money, income, and causality. The American Economic Review 62 540–552.
  • Sims, C. and Zha, T. (1998). Bayesian methods for dynamic multivariate models. International Economic Review 39 949–968.
  • Spiegelhalter, D., Best, N., Carlin, B. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Sun, D. and Ni, S. (2004). Bayesian analysis of vector-autoregressive models with noninformative priors. J. Statist. Plann. Inference 121 291–309.
  • Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. and Golub, T. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96 2907–2912.
  • Tyson, P. and Preston-Whyte, R. (2000). The Weather and Climate of Southern Africa. Oxford Univ. Press.
  • Vrac, M., Stein, M. and Hayhoe, K. (2007). Statistical downscaling of precipitation through nonhomogeneous stochastic weather typing. Climate Research 34 169.
  • Zivot, E. and Wang, J. (2006). Modeling Financial Time Series with S-plus. Springer, New York.