Clustering Techniques in Climate Analysis
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Climate Science. Please check back later for the full article.
Clustering techniques are used in the analysis of weather and climate to identify distinct, discrete groups of atmospheric and oceanic structures and evolutions from observations, reanalyses, and numerical model simulations and predictions. The goal of cluster analysis is to provide physical insight into states (and trajectories) that are preferred and also possibly unusually persistent, when such states can be identified and distinguished from the continuous background distribution of geophysical variables. “Preferred” states (or evolutions) are those that are significantly more likely to occur than would be predicted by a suitable background distribution (such as a multivariate Gaussian distribution), while “persistent” states are those with lifetimes distinctly longer than those of the background states.
The choice of technique depends to a large extent on its application. For example, the identification of a small number of distinct patterns of the seasonal mean mid-latitude response to large seasonal mean shifts in tropical diabatic heating (perhaps due to the El Niño–Southern Oscillation) can be accomplished with the use of either a partitioning or hierarchical cluster analysis. The partitioning cluster method groups all states (maps of a given variable) into clusters so as to minimize the within-cluster variance, while the hierarchical analysis merges fields into groups based on their similarities. The identification of preferred patterns (whether or not they are tropically forced) on intra-seasonal time scales can also be accomplished in this way. The partitioning approach can easily be adapted to include multiple variables, and to describe tracks of localized features (such as cyclones). A variant of the partitioning cluster analysis, the “self -organizing map” approach, allows for a greater richness in cluster patterns and so can be useful on shorter, weather-related time scales.
In either the partitioning or hierarchical analysis, each state (map) is identified uniquely with a given cluster. However, in certain applications it may be desirable to allow a given state to belong to multiple clusters with differing probabilities. In such cases one can estimate the underlying probability distribution function with a mixture model, which is a sum of a (usually small) number of component multivariate Gaussian distributions.
The partitioning, hierarchical, and mixture model approaches, applied to a sequence of maps, all have one common feature: the sequencing (order in time) of the maps is not taken into account. This is not the case with the hidden Markov method, an approach that identifies not only preferred states but ones that are also unusually persistent. This approach, based on a simple neural network approach, makes use of an underlying “hidden variable” whose evolution is modeled by a Markov process. Each state is assigned to a number of clusters with a certain probability, but the most likely evolution of states from one cluster to another can be estimated. This approach can be generalized by letting the evolution of the hidden state be governed by a nonstationary multivariate autoregressive factor process. The resulting cluster analysis can then also detect long-term changes in the population of the clusters.