Kernel Methods¶
In the following, we will discuss the use of kernels to compare time series. A kernel \(k(\cdot, \cdot)\) is such that there exists an unknown map \(\Phi\) such that:
i.e. \(k(\cdot, \cdot)\) is the inner product between \(\mathbf{x}\) and \(\mathbf{y}\) in some (unknown) embedding space \(\mathcal{H}\). In practice, \(k(\mathbf{x}, \mathbf{y})\) will be large when \(\mathbf{x}\) and \(\mathbf{y}\) are similar and close to 0 when they are very dissimilar.
A large number of kernel methods from the machine learning literature rely on the so-called kernel trick, that consists in performing computations in the embedding space \(\mathcal{H}\) without ever actually performing any embedding. As an example, one can compute distance between \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathcal{H}\) via:
Such computations are used, for example, in the kernel-\(k\)-means algorithm (see below).
Global Alignment Kernel¶
The Global Alignment Kernel (GAK) is a kernel that operates on time series.
It is defined, for a given bandwidth \(\sigma\), as:
where \(\mathcal{A}(\mathbf{x}, \mathbf{y})\) is the set of all possible alignments between series \(\mathbf{x}\) and \(\mathbf{y}\).
It is advised in 1 to set the bandwidth \(\sigma\) as a multiple of a
simple estimate of the median distance of different points observed in
different time-series of your training set, scaled by the square root of the
median length of time-series in the set.
This estimate is made available in tslearn
through
tslearn.metrics.sigma_gak:
from tslearn.metrics import gak, sigma_gak
sigma = sigma_gak(X)
k_01 = gak(X[0], X[1], sigma=sigma)
Note however that, on long time series, this estimate can lead to numerical overflows, which smaller values can avoid.
Finally, GAK is related to softDTW 3 through the following formula:
where \(\gamma\) is the hyper-parameter controlling softDTw smoothness, which is related to the bandwidth parameter of GAK through \(\gamma = 2 \sigma^2\).
Clustering and Classification¶
Kernel \(k\)-means 2 is a method that uses the kernel trick to implicitly perform \(k\)-means clustering in the embedding space associated to a kernel. This method is discussed in our User Guide section dedicated to clustering.
Kernels can also be used in classification settings.
tslearn.svm
offers implementations of Support Vector Machines (SVM)
that accept GAK as a kernel.
This implementation heavily relies on scikit-learn
and libsvm
.
One implication is that predict_proba
and predict_log_proba
methods
are computed based on cross-validation probability estimates, which has two
main implications, as discussed in more details in scikit-learn
’s
user guide:
1. setting the constructor option probability
to True
makes the fit
step longer since it then relies on an expensive five-fold cross-validation;
2. the probability estimates obtained through predict_proba
may be
inconsistent with the scores provided by decision_function
and the
predicted class output by predict
.