# Kernel Methods¶

In the following, we will discuss the use of kernels to compare time series. A kernel \(k(\cdot, \cdot)\) is such that there exists an unknown map \(\Phi\) such that:

i.e. \(k(\cdot, \cdot)\) is the inner product between \(\mathbf{x}\) and \(\mathbf{y}\) in some (unknown) embedding space \(\mathcal{H}\). In practice, \(k(\mathbf{x}, \mathbf{y})\) will be large when \(\mathbf{x}\) and \(\mathbf{y}\) are similar and close to 0 when they are very dissimilar.

A large number of kernel methods from the machine learning literature rely on the so-called kernel trick, that consists in performing computations in the embedding space \(\mathcal{H}\) without ever actually performing any embedding. As an example, one can compute distance between \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathcal{H}\) via:

Such computations are used, for example, in the kernel-\(k\)-means algorithm (see below).

## Global Alignment Kernel¶

The Global Alignment Kernel (GAK) is a kernel that operates on time series.

It is defined, for a given bandwidth \(\sigma\), as:

where \(\mathcal{A}(\mathbf{x}, \mathbf{y})\) is the set of all possible alignments between series \(\mathbf{x}\) and \(\mathbf{y}\).

It is advised in [1] to set the bandwidth \(\sigma\) as a multiple of a
simple estimate of the median distance of different points observed in
different time-series of your training set, scaled by the square root of the
median length of time-series in the set.
This estimate is made available in `tslearn`

through
tslearn.metrics.sigma_gak:

```
from tslearn.metrics import gak, sigma_gak
sigma = sigma_gak(X)
k_01 = gak(X[0], X[1], sigma=sigma)
```

Note however that, on long time series, this estimate can lead to numerical overflows, which smaller values can avoid.

Finally, GAK is related to softDTW [3] through the following formula:

where \(\gamma\) is the hyper-parameter controlling softDTw smoothness, which is related to the bandwidth parameter of GAK through \(\gamma = 2 \sigma^2\).

## Clustering and Classification¶

Kernel \(k\)-means [2] is a method that uses the kernel trick to implicitly perform \(k\)-means clustering in the embedding space associated to a kernel. This method is discussed in our User Guide section dedicated to clustering.

Kernels can also be used in classification settings.
`tslearn.svm`

offers implementations of Support Vector Machines (SVM)
that accept GAK as a kernel.
This implementation heavily relies on `scikit-learn`

and `libsvm`

.
One implication is that `predict_proba`

and `predict_log_proba`

methods
are computed based on cross-validation probability estimates, which has two
main implications, as discussed in more details in `scikit-learn`

’s
user guide:

1. setting the constructor option `probability`

to `True`

makes the `fit`

step longer since it then relies on an expensive five-fold cross-validation;

2. the probability estimates obtained through `predict_proba`

may be
inconsistent with the scores provided by `decision_function`

and the
predicted class output by `predict`

.