Time Series Clustering¶
Clustering is the task of grouping together similar objects. This task hence heavily relies on the notion of similarity one relies on.
The following Figure illustrates why choosing an adequate similarity function is key (code to reproduce is available in the Gallery of Examples).
This Figure is the result of a \(k\)means clustering that uses Euclidean distance as a base metric. One issue with this metric is that it is not invariant to time shifts, while the dataset at stake clearly holds such invariants.
\(k\)means and Dynamic Time Warping¶
To overcome the previously illustrated issue, distance metrics dedicated to time series, such as Dynamic Time Warping (DTW), are required. As can be seen in the Figure below, the use of such metrics produce more meaningful results.
The tslearn.clustering
module in tslearn
offers an
option to use DTW as the core metric in a \(k\)means algorithm,
which leads to better clusters and centroids:
First, clusters gather time series of similar shapes, which is due to the ability of Dynamic Time Warping (DTW) to deal with time shifts, as explained above. Second, cluster centers (aka centroids) are computed as the barycenters with respect to DTW, hence they allow to retrieve a sensible average shape whatever the temporal shifts in the cluster (see our dedicated User Guide section for more details on how these barycenters are computed).
In tslearn
, clustering a time series dataset with \(k\)means and a
dedicated time series metric is as easy as
from tslearn.clustering import TimeSeriesKMeans
model = TimeSeriesKMeans(n_clusters=3, metric="dtw",
max_iter=10, random_state=seed)
model.fit(X_train)
where X_train
is the considered unlabelled dataset of time series.
The metric
parameter can also be set to "softdtw"
as an alternative
time series metric (cf.
our User Guide section on softDTW).
Kernel \(k\)means and Time Series Kernels¶
Another option to deal with such time shifts is to rely on the kernel trick. Indeed, [1] introduces a positive semidefinite kernel for time series, inspired from DTW. Then, the kernel \(k\)means algorithm [2], that is equivalent to a \(k\)means that would operate in the Reproducing Kernel Hilbert Space associated to the chosen kernel, can be used:
A first significant difference (when compared to \(k\)means) is that cluster centers are never computed explicitly, hence time series assignments to cluster are the only kind of information available once the clustering is performed.
Second, one should note that the clusters generated by kernel\(k\)means are phase dependent (see clusters 2 and 3 that differ in phase rather than in shape). This is because Global Alignment Kernel is not invariant to time shifts, as demonstrated in [3] for the closely related softDTW [4].
Examples Using Clustering Estimators¶
References¶
[1] 

[2]  I. S. Dhillon, Y. Guan & B. Kulis. “Kernel kmeans, Spectral Clustering and Normalized Cuts,” KDD 2004. 
[3]  H. Janati, M. Cuturi, A. Gramfort. “SpatioTemporal Alignments: Optimal transport through space and time,” AISTATS 2020 
[4]  M. Cuturi, M. Blondel “SoftDTW: a Differentiable Loss Function for TimeSeries,” ICML 2017. 