## The Annals of Applied Statistics

- Ann. Appl. Stat.
- Volume 2, Number 4 (2008), 1194-1216.

### Interpreting self-organizing maps through space–time data models

Huiyan Sang, Alan E. Gelfand, Chris Lennard, Gabriele Hegerl, and Bruce Hewitson

#### Abstract

Self-organizing maps (SOMs) are a technique that has been used
with high-dimensional data vectors to develop an archetypal set
of states (nodes) that span, in some sense, the high-dimensional
space. Noteworthy applications include weather states as
described by weather variables over a region and speech patterns
as characterized by frequencies in time. The SOM approach is
essentially a neural network model that implements a nonlinear
projection from a high-dimensional input space to a
low-dimensional array of neurons. In the process, it also
becomes a clustering technique, assigning to any vector in the
high-dimensional data space the node (neuron) to which it is
closest (using, say, Euclidean distance) in the data space. The
number of nodes is thus equal to the number of clusters.
However, the primary use for the SOM is as a representation
technique, that is, finding a set of nodes which
representatively *span* the high-dimensional space. These
nodes are typically displayed using maps to enable visualization
of the continuum of the data space. The technique does not
appear to have been discussed in the statistics literature so it
is our intent here to bring it to the attention of the
community. The technique is implemented algorithmically through
a training set of vectors. However, through the introduction of
stochasticity in the form of a space–time process model, we seek
to illuminate and interpret its performance in the context of
application to daily data collection. That is, the observed
daily state vectors are viewed as a time series of multivariate
process realizations which we try to understand under the
dimension reduction achieved by the SOM procedure.

The application we focus on here is to synoptic climatology where the goal is to develop an array of atmospheric states to capture a collection of distinct circulation patterns. In particular, we have daily weather data observed in the form of 11 variables measured for each of 77 grid cells yielding an 847×1 vector for each day. We have such daily vectors for a period of 31 years (11,315 days). Twelve SOM nodes have been obtained by the meteorologists to represent the space of these data vectors. Again, we try to enhance our understanding of dynamic SOM node behavior arising from this dataset.

#### Article information

**Source**

Ann. Appl. Stat., Volume 2, Number 4 (2008), 1194-1216.

**Dates**

First available in Project Euclid: 8 January 2009

**Permanent link to this document**

https://projecteuclid.org/euclid.aoas/1231424206

**Digital Object Identifier**

doi:10.1214/08-AOAS174

**Mathematical Reviews number (MathSciNet)**

MR2655655

**Zentralblatt MATH identifier**

05505351

**Keywords**

Bivariate spatial predictive process space–time models Markov chain Monte Carlo model choice vector autoregressive model

#### Citation

Sang, Huiyan; Gelfand, Alan E.; Lennard, Chris; Hegerl, Gabriele; Hewitson, Bruce. Interpreting self-organizing maps through space–time data models. Ann. Appl. Stat. 2 (2008), no. 4, 1194--1216. doi:10.1214/08-AOAS174. https://projecteuclid.org/euclid.aoas/1231424206