3. Kernel Methods#
In the following, we will discuss the use of kernels to compare time series. A kernel \(k(\cdot, \cdot)\) is such that there exists an unknown map \(\Phi\) such that:
i.e. \(k(\cdot, \cdot)\) is the inner product between \(\mathbf{x}\) and \(\mathbf{y}\) in some (unknown) embedding space \(\mathcal{H}\). In practice, \(k(\mathbf{x}, \mathbf{y})\) will be large when \(\mathbf{x}\) and \(\mathbf{y}\) are similar and close to 0 when they are very dissimilar.
A large number of kernel methods from the machine learning literature rely on the so-called kernel trick, that consists in performing computations in the embedding space \(\mathcal{H}\) without ever actually performing any embedding. As an example, one can compute distance between \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathcal{H}\) via:
Such computations are used, for example, in the kernel-\(k\)-means algorithm (see below).
3.1. Global Alignment Kernel#
The Global Alignment Kernel (GAK) is a kernel that operates on time series.
The unnormalized GAK is defined, for a given bandwidth \(\sigma\), as:
where \(\mathcal{A}(\mathbf{x}, \mathbf{y})\) is the set of all possible alignments between series \(\mathbf{x}\) and \(\mathbf{y}\).
Note that the function gak is normalized in tslearn: it corresponds to the quotient
This normalization ensures that \(\text{gak}(\mathbf{x}, \mathbf{x})=1\) for all \(\mathbf{x}\) and \(\text{gak}(\mathbf{x}, \mathbf{y}) \in [0, 1]\) for all \(\mathbf{x}, \mathbf{y}\).
It is advised in [1] to set the bandwidth \(\sigma\) as a multiple of a
simple estimate of the median distance of different points observed in
different time-series of your training set, scaled by the square root of the
median length of time-series in the set.
This estimate is made available in tslearn through
sigma_gak:
from tslearn.metrics import gak, sigma_gak
sigma = sigma_gak(X)
k_01 = gak(X[0], X[1], sigma=sigma)
Note however that, on long time series, this estimate can lead to numerical overflows, which smaller values can avoid.
Finally, the unnormalized GAK is related to softDTW [3] through the following formula:
where \(\gamma\) is the hyper-parameter controlling softDTw smoothness, which is related to the bandwidth parameter of GAK through \(\gamma = 2 \sigma^2\).
3.2. Clustering and Classification#
Kernel \(k\)-means [2] is a method that uses the kernel trick to implicitly perform \(k\)-means clustering in the embedding space associated to a kernel. This method is discussed in our User Guide section dedicated to clustering.
Kernels can also be used in classification settings.
tslearn.svm offers implementations of Support Vector Machines (SVM)
that accept GAK as a kernel.
This implementation heavily relies on scikit-learn and libsvm.
One implication is that predict_proba and predict_log_proba methods
are computed based on cross-validation probability estimates, which has two
main implications, as discussed in more details in scikit-learn’s
user guide:
1. setting the constructor option probability to True makes the fit
step longer since it then relies on an expensive five-fold cross-validation;
2. the probability estimates obtained through predict_proba may be
inconsistent with the scores provided by decision_function and the
predicted class output by predict.