Persistent Homology for Land Cover Change Detection
 Djamel BouchaffraDjamel BouchaffraCentre for Development of Advanced Technologies
 and Faycal YkhlefFaycal YkhlefCenter for development of Advanced Technologies, Algeria
Summary
The need for environmental protection, monitoring, and security is increasing, and land cover change detection (LCCD) can aid in the valuation of burned areas, the study of shifting cultivation, the monitoring of pollution, the assessment of deforestation, and the analysis of desertification, urban growth, and climate change. Because of the imminent need and the availability of data repositories, numerous mathematical models have been devised for change detection. Given a sample of remotely sensed images from the same region acquired at different dates, the models investigate if a region has undergone change. Even if there is no substantial advantage to using pixelbased classification over objectbased classification, a pixelbased change detection approach is often adopted. A pixel can encompass a large region, and it is imperative to determine whether this pixel (input) has changed or not. A changed image is compared to the available ground truth image for pixelbased performance evaluation. Some existing change detection systems do not take into account reversible changes due to seasonal weather effects. In other words, when snow falls in a region, the land cover is not considered as a change because it is seasonal (reversible).
Some approaches exploit time series of Landsat images, which are based on the Normalized Difference Vegetation Index technique. Others evaluate builtup expansion to assess urban morphology changes using an unsupervised approach that relies on labels clustering. Change detection methods have also been applied to the field of disaster management using objectoriented image classification. Some methodologies are based on spectral mixture analysis. Other techniques invoke a similarity measure based on the evolution of the local statistics of the image between two dates for vegetation LCCD. Probabilistic approaches based on maximum entropy have been applied to vegetation and forest areas, such as Hustai National Park in Mongolia. Researchers in this field have proposed an LCCD scheme based on a feedforward neural network using backpropagation for training. This paper invokes the new concept of homology theory, a subfield of algebraic topology. Homology theory is incorporated within a Structural Hidden Markov Model.
Previous Work and Motivation
This article addresses the question of whether irreversible change of land cover can be detected through construction of structural information using multidate set of images (Figure 1). Several studies have been conducted to map and identify landscape changes. Volpi, Tuia, Bovolo, Kanevski, and Bruzzone (2013) addressed the problem of highresolution ground cover change detection using support vector machine classification with radial basis function. Lu, Mausel, Batistella, and Moran (2005) developed a change detector based on a combination of perpixel impervious surface mapping with filtering and unsupervised classification. Du, Xia, and Li (2015) introduced the subpixel aspect in change detection for urban land cover analysis using a backpropagation neural network. Gong, Su, Jia, and Chen (2014) covered change detection systems using synthetic aperture radar images. The latter approach is based on fuzzy cmeans clustering using a novel Markov random field energy function. Lv, Zhong, Zhao, Jiao, and Zhang (2016) proposed an accurate detection based on a conditional random field model using high spatial resolution (HSR) imagery. Zhong, Liu, Zhao, and Zhang (2015) proposed pulsecoupled neural networks and the normalized moment of inertia feature dedicated to HSR remote sensing imagery. More recently, Gong, Zhao, and Liu (2016) proposed an LCCD system based on deep neural networks. It is worth noting that Deep Learning (DL) has become the preferred algorithm in data analysis and prediction. A key benefit of DL is its ability to model data through different levels of perception (stacked layers of transformations; Goodfellow, Bengio, & Courteville, 2016). However, DL requires an important number of samples in the training phase to build an efficient classification model. In the context of change detection, data remain scarce and therefore a DLbased model may not outperform a shallow learner.
This article introduces a novel formalism that merges probability theory (applied mathematics) with topology (pure mathematics). This approach incorporates homology theory within a structural hidden Markov model framework for land cover change detection in remote sensing. This task highlights both the effect of desertification caused by human activities as well as climate change in land cover variation. The proposed probabilistic approach, the topological hidden Markov model, predicts ordered sequences of pixels by providing clues about the shapes they form. Shape disclosure is achieved through a VietorisRips (VR) simplicial complex constructor. The latent local structures are unfolded gradually throughout the observation sequence via these simplicial complexes. These latter polytopes are characterized topologically (via a persistent homology) by their Betti numbers. Experimental results demonstrate the efficiency of the proposed methodology compared to other stateoftheart land cover change detectors.
The proposed approach to change detection fits within the realm of machine learning supplemented by clues from homology theory (Prasolov, 2007). It is different from most of the traditional available techniques in that radiometric structural changes are captured using topological qualitative features. The focus is on the search for these shapes (called local structures) formed by the data via the exploration of topological features predominant on airborne maps (Figure 2). Topological attributes are extracted via (a) VR simplicial complexes construction; (b) the concept of homology, which measures their Betti numbers (DeAlarcon, PascualMontana, Gupta, & Carazo, 2002; Edelsbrunner & Harer, 2010; Hajij, Wang, & Scheidegger, 2018); and (c) the persistent homology captured by crossvalidation. Wavelet decomposition is performed to generate a cloud of points assigned to a pixel.
A novel mathematical framework is proposed that locally incorporates topological clues (or signatures) within a twolevel HMM framework (Bengio & Frasconi, 1996; Bouchaffra, 2010, 2012; Bouchaffra, Govindaraju, & Srihari, 1999; Fine, Singer, & Tishby, 1998; Rabiner & Juang, 1993; Sanches, 2000). This mathematical approach handles the data in order to locally disclose their hidden persistent structures. The main contribution of the proposed work is fourfold; it consists of (a) generating a cloud of points through a discrete Wavelet decomposition, (b) assigning VR simplicial complexes to this cloud of points (observations), (c) constructing local structures via the use of persistent homology, and (d) building a structured HMM that integrates local structures and predicts whether a visible observation sequence has changed or not (Figure 3).
Topological Background
Some basic topological definitions deemed necessary to comprehend the subsequent work are provided in this section. The reader is advised to refer to Edelsbrunner (2014) for more comprehensive details.
Simplicial Complexes
Definition 1: A set of $\text{k}+1$ points, ${\text{x}}_{0},{\text{x}}_{1},\dots ,{\text{x}}_{\text{k}}$, is affinely independent if the $\text{k}$ vectors, ${\text{x}}_{1}{\text{x}}_{0},{\text{x}}_{2}{\text{x}}_{0},\dots ,{\text{x}}_{\text{k}}{\text{x}}_{0}$, are linearly independent.
Definition 2: A $\text{k}$simplex $\sigma $ is the convex hull of $\text{k}+1$ affinely independent points. The value of $\text{k}$ is the dimension of the $\text{k}$simplex $\sigma $ and the points ${\text{x}}_{0}$ to ${\text{x}}_{\text{k}}$ are called vertices.
Simplices of dimension $0,1,2,3$ correspond to vertices, edges, triangles, tetrahedral, respectively.
Definition 3: A face of $\sigma $ is a simplex spanned by a subset of the vertices of $\sigma $.
Definition 4: A simplicial complex is a finite collection of simplices, $\text{K}$, such that:
every simplex $\sigma \in \text{K}$, every face of $\sigma \in \text{K}$;
for every two simplices $\sigma $, $\varphi \in \text{K}$, the intersection, $\sigma \cap \varphi $, is either empty or a face of both simplices.
The dimension of $\text{K}$ is the largest dimension of any simplex that is an element of $\text{K}$. The Euler characteristic of $\text{K}$ is the alternating sum of simplex numbers:
where $\text{k}$ denotes the dimension of $\text{K}$ and ${\text{s}}_{\text{i}}$ corresponds to the number of $\text{i}$simplices, with $0\le \text{i}\le \text{k}$.
Abstract Simplicial Complex
Assume a finite collection of elements called vertices.
Definition 5: An abstract simplicial complex is a system of subcollection, $\text{A}$, such that $\text{u}\in \text{A}$ and $\text{v}\subseteq \text{u}\Rightarrow \text{v}\in \text{A}$. The sets $\text{u}$ are called abstract simplices.
The dimension of an abstract simplex is (card1), where “card” represents the simplex cardinality. However, the dimension of $\text{A}$ corresponds to the maximum dimension of any of its abstract simplices. The simplicial complex $\text{A}$ is drawn in a Euclidean space via a mapping of (a) each vertex to a point in the $\text{d}$dimensional Euclidean space ${\mathbb{R}}^{\text{d}}$ and (b) each abstract $\text{k}$simplex to the convex hull of the $\text{k}+1$ corresponding points. If this drawing obeys the two conditions of definition 4, then it is referred to as a geometric realization of $\text{A}$. For example, a onedimensional abstract simplicial complex is a graph, and a geometric realization is its straightline embedding.
VietorisRips Simplicial Complex
The geometrical constructors known as VR simplicial complex are defined as follows.
Definition 6: Letting $\text{R}\ge 0$ be a real number, and $\text{S}$ a finite set of points in ${\mathbb{R}}^{2}$, the VR complex of $\text{S}$ and $\text{R}$, denoted as VietorisRips(R), consists of all abstract simplices in $2\text{S}$ whose vertices are at most a distance $2\text{R}$ from one another.
In other words, any two vertices at a distance at most $2\text{R}$ from each other are connected by an edge, and a triangle (or higherdimensional simplex) is addended to the complex if all its edges are in the complex (Figure 4).
Persistent Homology and Betti Numbers Computation
The topological characterization of simplicial complexes is performed through the concept of persistent homology whereby Betti numbers are computed. The theory of persistence is inspired by Morse theory and associated with spectral sequences. It brings a quantitative constituent to the Betti numbers that is steady under disturbances. It aims to disclose the different scales under which a cloud of points can be perceived in one and only one coherent formalism.
Let C be a simplicial complex such as (a) isolated point, (b) line segment, (c) triangle, and (d) tetrahedron obtained using VR constructor. Larger simplicial complexes are made up of these basic ones using different values of d (Figure 5). Therefore, several simplicial complexes are exhibited in order to approximate the true shape of the cloud of points (data set). The ultimate goal of persistent homology consists of determining the one that is the most representative of the true shape formed by the cloud of points. It corresponds to the one that produces the qualitative noiseless features.
Birth and Death
The concept of birth and death is addressed in this section. The starting point is a sequence of homomorphisms connecting the homology groups of the complexes in the filtration. For $\text{i}\le \text{j}$, ${\text{K}}_{\text{i}}$ represents a subcomplex of ${\text{K}}_{\text{j}}$, which can be viewed as an injective map ${\text{f}}^{\text{i},\text{j}}\text{:}\phantom{\rule{0.2em}{0ex}}{\text{K}}_{\text{i}}\_\to {\text{K}}_{\text{j}}$. It maps a simplex in ${\text{K}}_{\text{i}}$ to the same simplex in ${\text{K}}_{\text{j}}$. This is why it is called the inclusion map. It carries over to an inclusion map on the $\text{p}$cycles, ${\text{f}}^{\text{i,j}}{}_{\text{p}}\text{:}\phantom{\rule{0.2em}{0ex}}{\text{Z}}_{\text{p}}({\text{K}}_{\text{i}})\_\to {\text{Z}}_{\text{p}}({\text{K}}_{\text{j}})$. This in turn creates a map on homology: ${\mathrm{\varphi}}^{\text{i},\text{j}}{}_{\text{p}}:{\text{H}}_{\text{p}}({\text{K}}_{\text{i}})\to {\text{H}}_{\text{p}}({\text{K}}_{\text{j}})$, which does not often represent an inclusion map. More precisely, if $\mathrm{\delta}$ is a class in ${\text{H}}_{\text{p}}({\text{K}}_{\text{i}})$, and $\text{z}\in {\text{Z}}_{\text{p}}\left({\text{K}}_{\text{i}}\right)$ is a representative cycle, let ${\varphi}^{\text{i},\text{j}}{}_{\text{p}}(\mathrm{\delta})$ be the class in ${\text{H}}_{\text{p}}({\text{K}}_{\text{j}})$ that contains ${\text{f}}^{\text{i,j}}{}_{\text{p}}(\text{z})$. It accepts as input a $\text{p}$cycle in ${\text{K}}_{\text{i}}$ and sends it to ${\text{K}}_{\text{j}}$ For example, if $\text{\delta}$ encloses a hole in ${\text{K}}_{\text{i}}$ that fills up at the time ${\text{K}}_{\text{j}}$ is reached, then ${\varphi}^{\text{i},\text{j}}{}_{\text{p}}$ maps $\text{\delta}$ to $0\in {\text{H}}_{\text{p}}\left({\text{K}}_{\text{j}}\right)$. The image of ${\varphi}^{\text{i},\text{j}}$ p is called a persistent homology group. It includes all $\text{p}$dimensional homology classes that already have representatives in ${\text{K}}_{\text{i}}$. A class $\text{\delta}\in {\text{H}}_{\text{p}}\left({\text{K}}_{\text{i}}\right)$ is born at ${\text{K}}_{\text{i}}$ if $\text{\delta}$ is not present in the image of ${\varphi}^{\text{i}\text{1},\text{i}}{}_{\text{p}}$, and a class born at ${\text{K}}_{\text{i}}$ dies entering ${\text{K}}_{\text{j+1}}$ if ${\varphi}^{\text{i},\text{j}}{}_{\text{p}}(\mathrm{\delta})$ is absent in the image of ${\varphi}^{\text{i}\text{1},\text{j}}{}_{\text{p}}$ but ${\varphi}^{\text{i},\text{j}+\text{1}}{}_{\text{p}}(\text{\delta})$ is present in the image of ${\varphi}^{\text{i}\text{1},\text{j}+\text{1}}{}_{\text{p}}$.
Betti Numbers
A simplicial complex exhibits a group structure through the addition of $\text{k}$simplices. The resulting free group is called the chain group, ${\text{C}}_{\text{k}}$. The notion of Betti numbers emanates from the quotient group ${\text{H}}_{\text{k}}={\text{Z}}_{\text{k}}/{\text{B}}_{\text{k}}$, in which ${\text{Z}}_{\text{k}}$ denotes the group structure of all $\text{k}$chains that have empty boundary; and ${\text{B}}_{\text{k}}$, subgroup of ${\text{C}}_{\text{k}}$, is also a subgroup of ${\text{Z}}_{\text{k}}$, called the boundary group. The number of distinct equivalent classes of ${\text{H}}_{\text{k}}$ corresponds to the $\text{k}$th Betti number denoted by ${\beta}_{\text{k}}$. Indeed, the $\text{k}$th Betti number tallies the number of $\text{k}$dimensional holes in the simplicial complex. In the case where $\text{k}=0$, ${\beta}_{0}$ counts the number of pathconnected components of the simplicial complex. In the case of subsets of the Euclidean space ${\mathbb{R}}^{3}$, ${\beta}_{\text{1}}$ counts the number of independent tunnels (holes) and ${\beta}_{\text{2}}$ counts the number of cavities, or enclosed voids. For example, in the case of the solid torus, the Betti numbers are ${\beta}_{0}=\text{1}$, ${\beta}_{\text{1}}=\text{1}$, and ${\beta}_{\text{2}}=0$. Therefore, one can state that a torus is topologically represented by the series $1,1,0$. However, the surface of a torus has ${\beta}_{0}=\text{1}$ (one connected component), ${\beta}_{\text{1}}=\text{2}$ (the one in the center and the one in the middle), and ${\beta}_{\text{2}}=\text{1}$ (the inside of the donut). The idea of describing local structures via Betti numbers is much more efficient than the one using Euler characteristic (a connectivity measure that is a zerodimensional Minkowski functional).
Barcodes
Betti numbers are computed for each simplicial complex (by varying $\text{d}$). From each stage to the next, pairing up the births and deaths, a set of intervals or bars called the barcode of the filtration is obtained. Each bar represents a class in one of the homology groups and thus has a definite dimension. Figure 6 depicts an example of a cloud of five points representing a “house with 1 hole.” The persistent homology (barcodes) discloses for $2\le \text{d}\le 2.8$, one connected component $({\text{\beta}}_{0}=1)$, one hole $({\text{\beta}}_{1}=1)$, and no “voids” or “cavities” $({\text{\beta}}_{2}=0)$, yielding a series of Betti numbers $1,1,0$.
Local Structures and Persistent Homology
Because the mission consists of identifying change in the local structures assigned to sequences of observations when viewed at two different points in time, it is therefore crucial to characterize a local structure mathematically. The question one investigates is twofold: (a) what would a local latent structure be and (b) when would this local structure be obtained when there are many possible $\text{d}$values (multiple resolutions) associated with a VR simplicial complex? Since a latent local structure corresponds to a conformation made of a group of symbols depicting a visible observation sequence, it is reasonable to invoke a geometric constructor (such as VR simplicial complex) to infer the true nature of this shape. All basic symbols of an observation sequence are therefore viewed as a cloud of ndimensional points in a Euclidean space. The simplicial complexes are built from this cloud and then qualitative topological information (Betti numbers) is successively extracted from them. However, the optimal ${\text{d}}^{*}$ value (that minimizes the error rate) among all possible $\text{d}$values has been computed using a crossvalidation scheme. This choice enables extraction of the most informative qualitative features, which are merely the persistent ones across multiple resolution levels $\text{d}$.
Proposition: A feature vector that contains persistent topologic signatures represents an element of a cluster: This cluster regroups similar simplicial complexes (Betti numbers).
For each visible sequence $\mathcal{O}={\text{O}}_{1}{\text{O}}_{2}\dots {\text{O}}_{\text{T}}$, $\text{T}$ simplicial complexes are thus constructed; each one is associated to a visible observation sequence ${\text{O}}_{\text{i}}\text{(i}=\text{1},\dots ,\text{T)}$. For a fixed $\text{d}$ value, VR simplicial complexes are finally captured and represented by feature vectors (made of topological attributes) that are further partitioned using a Dirichlet mixtures distribution clustering algorithm (Gorur & Rasmussen, 2010).
Definition 7: A local structure assigned to a simplicial complex (or to an observation sequence ${\text{O}}_{\text{i}}$) is a cluster ${\text{S}}_{\text{i}}$ of feature vectors produced by the optimal DMD clustering.
Once the optimal clustering is computed, each cluster $\text{Si}$ (grouping similar shapes) is defined as a local latent structure. Each shape is assigned to a constituent ${\text{O}}_{\text{i}}$ of the entire observation sequence $\mathcal{O}$ (Figure 7).
Topology within a Hidden Markov Model Framework
To better position this contribution, the traditional concept of HMM is briefly introduced as a benchmark to the proposed methodology.
Traditional HMM
The HMM approach views an observation sequence $\mathcal{O}={o}_{1}{o}_{2}{o}_{3}\dots {o}_{T}$ as a series of symbols produced by a sequence of hidden states with a certain emission probability distribution. Each symbol is emitted from one single hidden state using this probability distribution, and the hidden state distribution is a Markov chain. The following definition exhibits the statistical parameters of a tradition HMM.
Definition 8: An HMM is a triplet characterized by $\text{\lambda}=[\text{\pi},\text{A},\text{B}]$, where
$\text{\pi}=[{\text{\pi}}_{\text{i}}]$ is the initial state probability vector,
$\text{A}=[{\text{a}}_{\text{ij}}]$ is the hidden state transition probability matrix from state ${\text{q}}_{\text{i}}$ to state ${\text{q}}_{\text{j,}}1<\text{i,}\phantom{\rule{0.2em}{0ex}}\text{j}<\text{N}$,
$\text{N}$ is the number of hidden states in the model, and
$\text{B}=[{\text{b}}_{\text{j}}(\text{t})]$ is the state conditional probability matrix of the emitted symbol ${\text{o}}_{\text{t}}$ given a hidden state ${\text{q}}_{\text{j}}$, with $({\sum}_{\text{t}}{\text{b}}_{\text{j}}(\text{t})=1)\forall \text{j}$, $\text{T}$ is the number of observations ${\text{o}}_{\text{i}}$, each one of length ${\text{r}}_{\text{i}}$.
As an example, Figure 8 depicts a simple HMM graph.
Three problems are assigned to an HMM: (a) probability evaluation, (b) statistical decoding, and (c) parameter estimation (or training). In summary, the evaluation problem in the HMM consists of evaluating the probability for the model $\text{\lambda}=[\text{\pi},\text{A},\text{B}]$ that most likely produces the sequence $\mathcal{O}$. This probability can be expressed as
Using state conditional independence of the observation sequence $\mathcal{O}$, one obtains
The evaluation problem is based on the observation state conditional independence; however, there are numerous applications in which the conditional independence condition does not hold.
Topological HMM: Mathematical Framework
Because the proposed approach extracts qualitative features (using homology theory) from local shapes described by simplicial complexes, it is called a topological hidden Markov model (THMM). To provide a complete introduction of the THMM formalism, this section discusses the THMM context, background on the VR simplicial complexes constructor, and search and characterization of local structures.
THMM Context
In order to capture shapes, the THMM approach views an observation sequence $\mathcal{O}={O}_{1}{O}_{2}\dots {O}_{T}$ of length $\text{T}$ as a series of constituents ${O}_{i}$ made of strings of symbols ${o}_{ij}\in \sum $ interrelated in some way. Each observation sequence $\mathcal{O}$ is not only one sequence in which all constituents are conditionally independent but also a sequence that is divided into a series of $\text{T}$ strings ${O}_{i}$$({O}_{i}={o}_{{i}_{1}}{o}_{{i}_{2}}\dots {o}_{i{r}_{i}})$, $(1\le \text{i}\le \text{T})$, (${r}_{i}$ is the length of ${O}_{i}$). Each ${O}_{i}$ is produced from a set of local structures ${\text{S}}_{\text{i}}$. The observed symbols ${o}_{ij}$ are emitted from hidden states ${\text{q}}_{\text{ij}}$.
Sequence Evaluation
Let $\mathcal{O}=({O}_{1}{O}_{2}\dots {O}_{T})$ be the entire sequence of observations. Each observation ${O}_{i}$ is by itself a sequence of symbols ${\text{o}}_{\text{ij}}$, (i.e.,${O}_{i}={o}_{i1}{o}_{i2}\dots {o}_{iri}$) and is assigned to a structure $\text{Si}$. In other words, the entire visible observation sequence $\mathcal{O}$ reveals a sequence of local structures $\text{S}={\text{S}}_{\text{1}}{\text{S}}_{\text{2}}\dots {\text{S}}_{\text{T}}$. The THMM takes into account the observation sequences with their latent structures. Therefore, the classification problem can be cast into the determination of the class ${\text{\omega}}_{\text{i}}$ ($\text{i}=\text{1},\text{2},\dots ,\text{c}$, where $\text{c}$ is the total number of classes) that maximizes the probability of the entire observation $\mathcal{O}$ given a class model $\text{\omega}$. This statement can be written as
However, because (a) of the conditional independence of the ${\text{O}}_{\text{i}}$ with respect to the event $\{\text{S},\text{\omega}\}$ and (b) each ${\text{O}}_{\text{i}}$ depends only on its corresponding local structure $\text{Si}$, $\text{Prob}(\mathcal{O}\text{S},\text{\omega})$ and $\text{Prob}(\text{S\omega})$ are computed as
Likewise, it is reasonable to assume that the local structure distribution is a Markov chain of order 1, and therefore one can write
Finally, by regrouping the product terms in equations (2) and (3), one can express $\text{Prob}(\mathcal{O},\text{S\omega})$ as follows:
It may be helpful to focus on the different terms separately. It is worth underscoring that $\text{Prob}({\text{O}}_{\text{i}}{\text{S}}_{\text{i}})$ is difficult to estimate; however, the converse $\text{Prob}({\text{S}}_{\text{i}}{\text{O}}_{\text{i}})$ conveys more meaning because it invokes properties of the VR constructor. Therefore, using Bayes’s rule, one can write
Therefore, equation (6) can be transformed into
One can decompose $\text{Prob}({\text{O}}_{\text{i}}\text{\omega})$ as
where ${\text{O}}_{\text{i}}={\text{o}}_{\text{i}1},{\text{o}}_{\text{i}2},\dots ,{\text{o}}_{\text{iri}}$, and ${\text{Q}}_{\text{i}}={\text{q}}_{\text{i}1},{\text{q}}_{\text{i}2},\dots ,{\text{q}}_{\text{iri}}$. Therefore, equation (10) can be decomposed into
It is assumed that an observation ${\text{o}}_{\text{ij}}$ depends only on the hidden state ${\text{q}}_{\text{ij}}$ from which this observation has been emitted. Likewise, a hidden state ${\text{q}}_{\text{ij}}$ depends only on the previous hidden state ${\text{q}}_{\text{i}(\text{j}1)}$ (firstorder Markov chain). Using these assumptions, equation (11) is transformed into the following:
This latter expression requires an estimation of $\text{Prob}({\text{o}}_{1}{\text{q}}_{\text{i}})$ and $\text{Prob}({\text{q}}_{\text{i}}{\text{q}}_{\text{i}1})$. Finally, one derives the classification criterion, which is
Equation (12) allows classifying test observation sequences after an optimal training is conducted. Figure 9 depicts the graph of a THMM.
Definition 8: A topological HMM is a quintuplet $\text{\gamma}=\left[\text{\pi},\text{A},\text{B},\text{C},\text{D}\right]$ such that
vector $\text{\pi}=[{\text{\pi}}_{\text{i}}\text{]}\phantom{\rule{0.2em}{0ex}}\text{=}\phantom{\rule{0.2em}{0ex}}{\text{[prob}({\text{q}}_{\text{1}}),\dots ,\text{Prob}({\text{q}}_{\text{n}})]}^{\text{T}}$; $\text{matrix}\phantom{\rule{0.2em}{0ex}}\text{A}=[{\text{a}}_{\text{ij}}]=\text{Prob}({\text{q}}_{\text{i}}{\text{q}}_{\text{j}})$; $\text{matrix}\phantom{\rule{0.2em}{0ex}}\text{B}=[{\text{b}}_{\text{ij}}]=\text{Prob}({\text{o}}_{\text{i}}{\text{q}}_{\text{j}})$; $\text{matrix}\phantom{\rule{0.2em}{0ex}}\text{C}=[{\text{c}}_{\text{ij}}]=\text{Prob}({\text{S}}_{\text{j}}{\text{S}}_{\text{i}})$; $\text{matrix}\phantom{\rule{0.2em}{0ex}}\text{D}=[{\text{d}}_{\text{ij}}]=\text{Prob}({\text{S}}_{\text{i}}{\text{O}}_{\text{j}})$ with the natural constraints as in the traditional HMM framework.
Parameter Estimation
The parameters $\text{\pi}$, $\text{A}$, and $\text{B}$ are computed in the same way as in traditional HMM using a BaumWelch parameter estimation (or expectation maximization), however the matrices $\text{C}=[{\text{c}}_{\text{ij}}]$ and $\text{D}=[{\text{d}}_{\text{ij}}]$ that unfold the local structures are estimated as follows:
${\tilde{\text{d}}}_{\text{ij}}=\frac{\text{number of times k}{\text{nearest neighbors associated to O}}_{\text{j}}\phantom{\rule{0.2em}{0ex}}{\text{contained in S}}_{\text{i}}}{\left{\text{cluster S}}_{\text{i}}\right},$
${\tilde{\text{c}}}_{\text{ij}}=\frac{\text{number of occurences of the pair}\phantom{\rule{0.2em}{0ex}}<{S}_{\text{i}},{\text{S}}_{\text{j}}>\phantom{\rule{0.2em}{0ex}}in\phantom{\rule{0.2em}{0ex}}all\phantom{\rule{0.2em}{0ex}}training\phantom{\rule{0.2em}{0ex}}sequences}{<{\text{S}}_{\text{i}},+>},$
where $\left{\text{cluster S}}_{\text{i}}\right$ designates the cardinality of cluster ${\text{S}}_{\text{i}}$ and $<{\text{S}}_{\text{i}},+>$ represents the number of times the cluster ${\text{S}}_{\text{i}}$ occurs in the training set.
THMM Change Detection
This section explains the phases needed to build the THMM for land cover change in remote sensing.
Remote Sensing Feature Generation
Before any classification task such as change detection can take place, one has to extract discriminative and less costly features. It is well known that the classifier power generalization depends in part on the dimension of the feature space. Feature generation (aka extraction) in change detection represents a challenging and tedious task due to diverse regional characteristics such as (a) area scattering: urban regions usually account for small areas and are scattered in different locations; (b) large regions: vegetation and bare soil region represent large and homogeneous surfaces; and (c) thematic similarity: spectral features between some of the thematic classes such as “less dense urban” and “bare soil” are very much alike especially when the spatial resolution is limited. In traditional approaches, feature extraction in change detection is usually performed using software such as ENVI and eCongnition. These programs invoke spatial features (e.g., texture, area, orientation, size) as well as spectral attributes (e.g., histogram depicting occurrence frequencies of gray values, pixel radiometric values in different bands). Because this work is driven by the extraction of local structures that account for image spatial changes between two points in time, it is deemed sufficient to rely on radiometric values at different bandwidths to represent a pixel in an image.
Building VR Simplicial Complexes and Local Structures from a Pixel
The amount of variation of pixels’ radiometric values is an indicator of the type of change in the image. For example, in the case of a disaster, the image exhibits a drastic radiometric change, whereas in the case of urban growth, this radiometric variation is rather smooth. Each image pixel contributes to the formation of a simplicial complex. An observation sequence that represents a series of radiometric vectors in three bands is associated with a pixel. This sequence represents a cloud of points from which a simplicial complex is built. A twodimensional discrete wavelet transform (DWT) is applied to the entire image of dimension $(\text{r,c})$ for each band out of three. Therefore, this decomposition produces an average image and a sequence of detailed images in each scale. The inverse function of the wavelet transform is applied to these images in order to restore the original dimension $(\text{r,c})$. The goal of this decomposition is to associate each pixel of the original image to its corresponding pixel in a onetoone mapping.
Let $\text{D}\phantom{\rule{0.2em}{0ex}}\text{=}\left[{\text{A}}_{\text{j}},\left\{{\text{D}}_{\text{j}}^{\text{k}}\right\},\text{j}=1,\dots ,\text{J and k}=1,\dots ,\text{K}\right]$ be the decomposition in which ${\text{A}}_{\text{j}}$ refers to the approximation image at the $\text{j}$th scale and ${\text{D}}_{\text{j}}^{\text{k}}$ is the detail image at scale $\text{j}$ and orientation $\text{k}$. For the work described herein, a wavelet decomposition with four levels $\text{(scale}\phantom{\rule{0.2em}{0ex}}\text{J}=4)$ and three orientations $\text{(K}=3)$ for each level, was undertaken. For each scale, the $\text{i}$th pixel in the original image is associated with a vector containing its three values in the detail images. This procedure is repeated for each of the four scales and then for each band. Hence, DWT decomposition produces a set of 12 points in the details vector space. Figure 10 shows an example of a simplicial complex (built from the 12 points generated from a single pixel using DWT) in the 3D frequency bands. Betti numbers’ qualitative features are extracted from this simplicial complex and then assigned to its corresponding local structure cluster (a class of Betti numbers). This information is fed into the THMM framework for the purpose of classification.
THMM in Change Detection
Change detection is a classification problem with only two classes—Changed and Unchanged—that applies to a data sequence extracted from the pixel of the remote satellite image at two different points in time, ${\text{T}}_{1}$ and ${\text{T}}_{2}$. The THMM formalism fits seamlessly with the change detection application at hand. Because a pixel is an input, a VR simplicial complex is extracted from this pixel.
he entire sequence is $\mathcal{O}={\text{O}}_{1}{\text{O}}_{2},$ whereby each ${\text{O}}_{\text{i}}={\text{o}}_{\text{i}1}{\text{o}}_{\text{i}2}\dots {\text{o}}_{\text{i}12}$ designates the 12 3D points associated with a pixel of the image at time ${\text{T}}_{\text{i}}(\text{i}=1,2)$. A local latent structure $\text{Si}$ mapped to each ${\text{O}}_{\text{i}}(\text{i}=1,2)$ corresponds to the Betti number cluster computed from the simplicial complex generated by the VR constructor. In conclusion, the following instantiations are made:
The classes ${\text{\omega}}_{1}$ and ${\text{\omega}}_{2}$ correspond to Changed and Unchanged categories, respectively.
The sequence ${\text{O}}_{\text{i}}$ corresponds to the 12 3D points generated for a pixel using DWT.
The symbol $\text{Si}$ represents the Betti number class (a cluster, aka local structure) assigned to the sequence ${\text{O}}_{\text{i}}.$A symbol ${\text{o}}_{\text{ij}}$ of a sequence ${\text{O}}_{\text{i}}$ represents a 3D point among these 12 points.
A hidden state q_{i} is a frequency band $\text{Bi}$ selected during the DWT procedure. A set of frequency bands has been selected from which the 12 points have been generated.
Finally, maximum likelihood is applied in the computation of the best ${\text{\omega}}^{*}$ among ${\text{\omega}}_{1}$ and ${\text{\omega}}_{2}$ according to the following criterion:
It is worth noting that maximum likelihoodbased technique is the one most commonly used for land cover classification especially for medium resolution imagery. However, classification based on maximum likelihood alone has many limitations in separations of remote sensing data such as saltandpepper effects and challenges caused by mixed pixels, especially for highresolution images. To overcome this drawback, maximum likelihood is used only in the computation of the best class ${\text{\omega}}^{*}$ among ${\text{\omega}}_{1}$ and ${\text{\omega}}_{2}$ that optimizes the criterion defined in Equation 10. It is worth noting that during the supervised training, the sequences $\mathcal{O}={\text{O}}_{1}{\text{O}}_{2}$ are tagged changed or unchanged, depending on whether a pixel has changed or not in the image viewed at time ${\text{T}}_{2}$. The ratio of the number of changed pixels to the total number of pixels in the image represents the magnitude of change. Figure 11 illustrates the instantiation process of the THMM for change detection.
Experiments
Data Collection
To analyze the efficiency of the proposed model, two change detection data sets were selected, namely the Change Detection dataset and Sztaki AirChange Benchmark. The Change Detection data set comprises 1,000 pairs of images with $(800\times 600)=480,000$ pixels each. An image pair corresponds to one reference image and one test (or moving) image. Each pair is associated with a binary ground truth image with the same resolution. The images were constructed by the serious game Virtual Battle Station 2 with Bohemia Interactive Simulations. The data set consists of 100 different scenes containing several thematic (vegetation, water, buildings) and moderate ground reliefs. Each scene was viewed from five different angles, where camera positions were varied at steps of 10 degrees according circle of radius 100 meters at approximately 250 meters high, and with a fixed tilt of about 70 degrees. All images were acquired with a resolution of about 50 cm per pixel. To obtain more information about this data set, please refer to the “Change Detection dataset”. Figure 12 depicts a sample of image pairs at two points in time ${\text{T}}_{1}$ and ${\text{T}}_{2}$ and their ground binary truth images, highlighting the changed area with white dots.
The Sztaki AirChange Benchmark is the second database used. It comprises 13 aerial image pairs of size $952\times 640$ and with a resolution of 1.5 meters and their corresponding ground of truth. Images constituting each pair were acquired at different times over several years, allowing the generation of relevant changes, such as new builtup regions, building operations, planting of large groups of trees, and so on. This database also contains challenging problems in which reversible changes are considered as unchanged, where variations of the vegetation greenery or even shadows are labeled as unchanged in the ground truth. Figure 13 illustrates some samples of image pairs and their corresponding masks (ground truth images).
Training Phase
To measure the generalization power of the THMM classifier, a threefold crossvalidation estimation technique is performed in the entire set of 1,000 pairs of pixels. This set is divided into three subsets of 333 pairs of images each, and selected two subsets (666 pairs) for training and one subset for testing (333 pairs). All feature vectors are extracted using the MATLAB platform (MATLAB, 2015). The visible observation sequences $(\mathcal{O}={O}_{1}{O}_{2})$ are extracted from these 666 pairs of images. This procedure is performed using pixels from both classes, Changed and Unchanged, separately. Five Gaussian mixtures $(\text{M}=5)$ are used for the estimation of the emission probability values $\text{Prob}(\text{o}{}_{\text{t}}{\text{q}}_{\text{t}})$. Therefore, two optimal models ${}_{1}^{*}=[{\pi}_{1}^{*},{A}_{1}^{*},{B}_{1}^{*},{C}_{C}^{*},{D}_{1}^{*}]$ and ${}_{2}^{*}=[{\pi}_{2}^{*},{A}_{2}^{*},{B}_{2}^{*},{C}_{2}^{*},{D}_{2}^{*}]$ are obtained during each training phase. Testing is conducted following each training phase. This procedure is repeated three times with three different subsets for testing. The predicted classification for each pixel is compared to the corresponding pixel in the ground binary truth image. The results in each training/testing procedure are finally averaged to obtain an overall classification performance.
Testing: Decision and Classification
It is well known that a classification problem taking into account a reject option (whereby the classifier has the possibility to hold back its affectation) incurs a cost lower than misclassification (Duda, Hart, & Stork, 2001). Indeed, there are a significant number of cases in which radiometric values of a group of pixels have changed slightly because of a modification in the weather conditions (e.g., presence of clouds). These image regions are often classified as Changed, but no change has actually occurred. The geometric characterization of local structures assigned to these regions can only partially mitigate this false positive effect. A reject threshold can be determined via the risk minimization given by loss matrix. However, the loss matrix element $\text{\lambda}$ representing the loss incurred for choosing the reject option is computed using a selected validation set. Furthermore, when the THMM classifier makes a decision for a test observation sequence, it should take into account both the potential gain and potential loss. An accepted lowrisk pixel increases profit, while a rejected highrisk pixel decreases loss. The situation is much more critical and far from symmetric in the area of remote sensing change detection. Denote ${\text{\alpha}}_{\text{i}}(\text{i}=\text{1},\text{2})$ the action of assigning a single pixel (viewed via $\mathcal{O}$)to class ${\text{\omega}}_{\text{i}}$ (${\text{\omega}}_{1}$ = Changed and ${\text{\omega}}_{2}$ = Unchanged, respectively) and ${\text{\alpha}}_{\text{3}}$ the action assigned to the “reject” (or doubt) decision. Likewise, designate ${\lambda}_{ik}$ as the loss incurred for taking action ${\alpha}_{i}$ when the input actually belongs to class ${\omega}_{k}$. For an optimal classifier performance, statistical decision theory advocates the action ${\alpha}_{i}$ with a minimum conditional risk:
A correct classification is assigned a zero loss; however, the classification errors are not equally costly in the remote sensing change detection problem. In fact, wrongly classifying the changed pixels is more critical than misclassifying unchanged pixels. To discriminate wrong decision losses, the parameters ${\text{\lambda}}_{12}$ and ${\text{\lambda}}_{21}$ are set to 1 and 0.5, respectively. The loss matrix is then expressed as
Therefore, the conditional risk of reject is
To precisely evaluate the system performance, the receiver operating characteristic (ROC) curve is invoked (Li & Guo, 2014). The results from the folds are then averaged for each performance measure using kfold crossvalidation estimation technique to produce a single estimation.
Experimental Results and Interpretation
The THMM approach has been compared with the traditional HMM. In the HMM framework, a hidden state is assigned a discrete value representing one of the thematic classes (such as “water,” “grass,” etc.). The image was first segmented into thematic classes and each pixel belongs to one thematic class. The THMM has been implemented using the VR approach using the crossvalidation criterion that computes the optimal $\text{R}*$ with the maximum accuracy (Figure 14). It is worth noting that the classification performance depends on the quality of features and the classifier’s parameters. Different connectivity parameters have been tested during the training phase (parameter tuning phase). The $\text{R}$value $(\text{d}=2\text{R})$ that has produced the highest accuracy (95%) has been selected $({\text{R}}^{*}=0.9)$.
This selection of a unique $\text{d}$ value represents one way to proceed because a filtration (for continuous $\text{d}$values) of a finite simplicial complex in which persistent homology exhibits qualitative features (barcodes) can also be performed. The choice of selecting a unique $\text{d}$ value is justified by the risk of extracting data inherent noise or artifacts. Furthermore, an experimental classification reject threshold value is set to $\text{\lambda}=0.02$. Figure 15 illustrates an example of change detection result obtained on a test image applied to a change detection data set. Figure 16 depicts a sample of qualitative results over three aerial data sets, namely the Szada, Archive, and Tiszadob. The obtained binary images correspond perfectly to the change mask of the ground truth. Furthermore, these results show that the proposed approach produces homogeneous and smoothing effect. One of the main strengths of the proposed method based on VR simplicial complex representation is its ability to distinguish between real changes and false changes caused by shadows, or seasonal weather effects.
For comparison purposes, three classifiers have been implemented: (a) an HMM based only on radiometric values (b) and an HMM that takes into account textural information assigned to a pixel. In order to conduct a comparison between these three classifiers, different values of the classification reject threshold $\text{\lambda}$ that allows different loss matrices to be exhibited are considered. The ROC curves of these classifiers are computed and depicted in Figure 17.
As a second experiment task, the THMM has been compared with other competitive techniques in the field of machine learning (see Dai & Khorram, 1999; Volpi et al., 2013). These techniques were brought inside a MATLAB platform. To ensure reproducibility of the results, this comparative work is conducted on the same database, Change Detection dataset. In other words, some results from the stateoftheart benchmarked machine learning techniques are provided to derive a global assessment and to widely position the contribution of the THMM formalism. To promote reproducibility and perform a fair model comparison between different change detection approaches, the same set of features prior to classification are extracted.
Tables 1 and 2 indicate that the THMM outperforms some stateofthearts machine learning methodologies when applied to LCCD on both databases (Change Detection dataset and Sztaki AirChange Benchmark). For the purpose of comparison, experiments were conducted on both publicly available data sets, allowing for a fair evaluation. In Figure 17, the area under curve assigned to the THMM is larger than the one assigned to HMMs. The results shown in Tables 1 and 2 clearly underscore the advantage of the THMM over traditional change detection classifiers. The THMM outperforms the other methods because the proposed formalism exploits geometric information, which is highly sensitive to real changes (expressed by geometric representation of pixel sequence). However, these geometric clues cannot always distinguish between real and false changes. According to the geometric formalism, many reversible changes are also labeled as changes. The topological aspect was considered by introducing Betti numbers in extracting features to separate false changes from detected pixels.
Table 1. Comparison of the THMM Approach With Current Machine Learning Techniques on Change Detection Datasets
Approach 
Change Detector 
Overall Accuracy 


Dai & Khorram, 1999 
Pixel radiometric values at different bands and different times of an image pertaining to the change detection data sets are sequentially fed to the change detector, which is a feedforward neural network. 
FeedForward Neural Networks with three layers and four classes (16 change classes). 
95.6% 
Volpi et al., 2013 
Pixel radiometric values at different bands and different times of an image extracted from the change detection data sets are sequentially fed to the change detector, which is a support vector machine. 
The change detection system is a support vector machine with a Gaussian radial basis function as a kernel. The oneagainstall scheme is adopted with four classes. 
92.12% 
THMM Proposed Methodology 
Topological descriptors are embedded within a structural HMM type: They are extracted from the observation sequence using the concept of Betti numbers applied to VR simplicial complexes. 
Given two models ${\text{\omega}}_{1}$ and ${\text{\omega}}_{2}$, and a sequence of observations with its topological structure series: compute the model among the two that most likely has generated this sequence. 
95.10% 
Note. THMM, Topological Hidden Markov Model; VR, VietorisRips.
Table 2. Comparison of the THMM Approach With Current Machine Learning Techniques on Sztaki AirChange Benchmark
Approach 
Change Detector 
Accuracy 


Singh, Kato, Zoltan, and Zerubia, 2014 
Information from modified histogram of oriented gradients difference and the graylevel difference 
A threelayer Markov random field takes into account information from two different sets of features, namely the modified HOG difference and the graylevel difference. 
94.65% 
Benedek and Tamás, 2009 
Global intensity statistics with local correlation and contrast features, applying the global energy optimization process to obtain a simultaneously optimal local feature selection 
Multilayer conditional mixed Markov model. 
95% 
THMM (proposed methodology) 
Topological descriptors are embedded within a structural HMM type: They are extracted from the observation sequence using the concept of Betti numbers applied to VR simplicial complexes. 
Given two models ${\text{\omega}}_{1}$ and ${\text{\omega}}_{2}$, and a sequence of observations with its topological structure series: compute the model among the two that most likely has generated this sequence. 
95.84% 
Note. HOG, Histogram of Oriented Gradients; THMM, Topological Hidden Markov Model; VR, VietorisRips.
Conclusion
The THMM presented in this article is a probabilistic land cover change detector that brings about novel information related to the ecological system of a landscape and how people are using this land. Clues about climate change are also inferred from the results obtained. This detector accounts for shapes formed by a sequence of pixels. The topological extension of HMMbased sequence classifiers represents an important step in combining statistics with homology theory in the context of change detection in remote sensing. The experimental results obtained are encouraging because the THMM has outperformed some traditional approaches. This success demonstrates the need for deciphering the conformation of an entire observation sequence. Furthermore, according to the same experiments, the performance of the THMM within the machine learning arena remains significant (around a 4% margin). Future work should focus on determining another criterion for an optimal persistent homology that produces a global (and not local) optimum of the classifier error rate.
Acknowledgment
We would like to show our gratitude to Dr. Nabil Zerrouki for conducting a part of the experiments in this research.
Further Reading
 AlKhudhairy, D., Caravaggi, I., & Giada, S. (2005). Structural damage assessments from Ikonos data using change detection, objectoriented segmentation, and classification techniques. Photogrammetric Engineering and Remote Sensing, 71(7), 825–837.
 Bouchaffra, D., Cheriet, M., Jodoin, P. M., & Beck, D. (Eds.). (2015). Machine learning and pattern recognition models in change detection [Special Issue]. Pattern Recognition, 48(3), 613–615.
 Huang, B., Li, Z., & Bo, W. (2009). Spatiotemporal analysis of rural–urban land conversion. International Journal of Geographical Information Science, 23(3), 379–398.
 Hussain, M., Chen, D., Cheng, A., Wei, H., & Stanley, D. (2013). Change detection from remotely sensed images: From pixelbased to objectbased approaches. ISPRS Journal of Photogrammetry and Remote Sensing, 80, 91–106.
 Inglada, J., & Mercier, G. (2007). A new statistical similarity measure for change detection in multitemporal SAR images and its extension to multiscale change analysis. IEEE Transactions on Geoscience and Remote Sensing, 45(5), 1432–1445.
 Jianya, G., Haigang, S., Guorui, M., & Qiming, Z. (2008). A review of multitemporal remote sensing data change detection algorithms. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 37(B7), 757–762.
 Lu, D., Mausel, P., Brondízio E., & Moran, E. (2004). Change detection techniques. International Journal of Remote Sensing, 25(12), 2365–2401.
 Ludus, U., Bayarsaikhan, U., Bazartseren, B., KyungRyul, K., Park, K., & Lee, D. (2009). Change detection and classification of land cover at Hustai National Park in Mongolia. International Journal of Applied Earth Observation and Geoinformation, 11(4), 273–280.
 Masek, J. G., Vermote, E. F., Nazmi, E., Forrest, W., Hall, G., Huemmrich, K.F., . . . Lim, T. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters, 3(1), 68–72.
 Yang, C., & Lo, P. (2002). Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia, metropolitan area. International Journal of Remote Sensing, 23(9), 1775–1798.
 Yuan, F., Kali, E., Loeffelholz, B. C., & Bauer, M. E. (2005). Land cover classification and change analysis of the Twin Cities (Minnesota) metropolitan area by multitemporal Landsat remote sensing. Remote Sensing of Environment, 98(2), 317–328.
 Zerrouki N., & Bouchaffra, D. (2014). Pixelbased or objectbased: Which approach is more appropriate for remote sensing image classification? In IEEE International Conference on Systems, Man and Cybernetics.
References
 Benedek, C., & Tamás, S. (2009). Change detection in optical aerial images by a multilayer conditional mixed Markov model. IEEE Transactions on Geoscience and Remote Sensing, 47(10), 3416–3430.
 Bengio, Y., & Frasconi, P. (1996). Inputoutput HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5), 1231–1249.
 Bouchaffra, D. (2010). Conformationbased hidden Markov models: Application to human face identification. IEEE Transactions on Neural Networks, 21(4), 597–608.
 Bouchaffra, D. (2012). Mapping dynamic Bayesian networks to $\text{\sigma}$shapes: Application to human faces identification across ages. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1229–1241.
 Bouchaffra, D., Govindaraju, V., & Srihari, S. (1999). Postprocessing of recognized strings using nonstationary Markovian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10), 990–999.
 Dai, X. L., & Khorram, S. (1999). Remotely sensed change detection based on artificial neural networks. Photogrammetric Engineering and Remote Sensing, 65, 1187–1194.
 DeAlarcon, P. E., PascualMontana, A., Gupta, A., & Carazo, J. M. (2002). Modeling shape and topology of lowresolution density maps of biological macromolecules. Biophysical Journal, 83, 619–632.
 Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York, NY: Wiley.
 Du, P., Xia, J., & Li, F. (2015). Monitoring urban impervious surface area change using ChinaBrazil earth resources satellites and HJ1 remote sensing images. Journal of Applied Remote Sensing, 9(1), 096094.
 Edelsbrunner, H. (2014). A short course in computational geometry and topology. Springer Briefs in Applied Sciences and Technology. Cham, Switzerland: Springer.
 Edelsbrunner, H., & Harer, J. L. (2010). Computational topology: An introduction. Providence, Rhode Island: American Mathematical Society.
 Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32, 41–62.
 Gong, M., Su, L., Jia, M., & Chen, W. (2014). Fuzzy clustering with modified Markov random field energy function for change detection in synthetic aperture radar images. IEEE Transactions on Fuzzy Systems, 22(1), 98–109.
 Gong, M., Zhao, J., & Liu, J. (2016). Change detection in synthetic aperture radar images based on deep neural networks. IEEE Transactions on Neural Networks and Learning Systems, 27(1), 125–138.
 Goodfellow, I., Bengio, Y., & Courteville, A. (2016). Deep learning. Cambridge: MIT press.
 Gorur, D., & Rasmussen, C. E. (2010). Dirichlet process mixture models: Choice of the base distribution. Computer Science and Technology, 25(4), 615–626.
 Hajij, M., Wang, B., & Scheidegger, C. (2018). Visual detection of structural changes in timevarying graphs using persistent homology. Proceedings of the IEEE Pacific Visualization Symposium, 125–134.
 Li, W., & Guo, Q. (2014). A new accuracy assessment method for oneclass remote sensing classification. IEEE Transactions on Geoscience and Remote Sensing, 52(8), 4621–4632.
 Lu, D., Mausel, P., Batistella, M., & Moran, E. (2005). Land‐cover binary change detection methods for use in the moist tropical region of the Amazon: A comparative study. International Journal of Remote Sensing, 26(1), 101–114.
 Lv, P., Zhong, Y., Zhao, J., Jiao, H., & Zhang, L. (2016). Change detection based on a multifeature probabilistic ensemble conditional random field model for high spatial resolution remote sensing imagery. IEEE Geoscience and Remote Sensing Letters, 13(12), 1965–1969
 MATLAB. (2015). Version 8.6.0 (R2015b). Natick, MA: The MathWorks Inc.
 Prasolov, V. V. (2007). Elements of homology theory. Graduate Studies in Mathematics 81, Providence, RI: American Mathematical Society.
 Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: PrenticeHall.
 Sanches, I. (2000). Noisecompensated hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8(5), 533–540.
 Singh, P., Kato, Z., Zoltan, C., & Zerubia, J. (2014). A multilayer Markovian model for change detection in aerial image pairs with large time differences. Proceedings of the IEEE 22^{nd} International Conference on Pattern Recognition (ICPR’14), 924–929.
 Volpi, M., Tuia, D., Bovolo, F., Kanevski, M., & Bruzzone, L. (2013). Supervised change detection in VHR images using contextual information and support vector machines. International Journal of Applied Earth Observation and Geoinformation, 20, 77–85.
 Zhong, Y., Liu, W., Zhao, J., & Zhang, L. (2015). Change detection based on pulsecoupled neural networks and the NMI feature for high spatial resolution remote sensing imagery. IEEE Geosciences and Remote Sensing Letters, 12(3), 537–541.