Time Series Clustering

Clustering is the task of grouping together similar objects. This task hence heavily relies on the notion of similarity one relies on.

The following Figure illustrates why choosing an adequate similarity function is key (code to reproduce is available in the Gallery of Examples).


\(k\)-means clustering with Euclidean distance. Each subfigure represents series from a given cluster and their centroid (in red).

This Figure is the result of a \(k\)-means clustering that uses Euclidean distance as a base metric. One issue with this metric is that it is not invariant to time shifts, while the dataset at stake clearly holds such invariants.

\(k\)-means and Dynamic Time Warping

To overcome the previously illustrated issue, distance metrics dedicated to time series, such as Dynamic Time Warping (DTW), are required. As can be seen in the Figure below, the use of such metrics produce more meaningful results.

The tslearn.clustering module in tslearn offers an option to use DTW as the core metric in a \(k\)-means algorithm, which leads to better clusters and centroids:


\(k\)-means clustering with Dynamic Time Warping. Each subfigure represents series from a given cluster and their centroid (in red).

First, clusters gather time series of similar shapes, which is due to the ability of Dynamic Time Warping (DTW) to deal with time shifts, as explained above. Second, cluster centers (aka centroids) are computed as the barycenters with respect to DTW, hence they allow to retrieve a sensible average shape whatever the temporal shifts in the cluster (see our dedicated User Guide section for more details on how these barycenters are computed).

In tslearn, clustering a time series dataset with \(k\)-means and a dedicated time series metric is as easy as

from tslearn.clustering import TimeSeriesKMeans

model = TimeSeriesKMeans(n_clusters=3, metric="dtw",
                         max_iter=10, random_state=seed)

where X_train is the considered unlabelled dataset of time series. The metric parameter can also be set to "softdtw" as an alternative time series metric (cf. our User Guide section on soft-DTW).

Kernel \(k\)-means and Time Series Kernels

Another option to deal with such time shifts is to rely on the kernel trick. Indeed, [1] introduces a positive semidefinite kernel for time series, inspired from DTW. Then, the kernel \(k\)-means algorithm [2], that is equivalent to a \(k\)-means that would operate in the Reproducing Kernel Hilbert Space associated to the chosen kernel, can be used:


Kernel \(k\)-means clustering with Global Alignment Kernel. Each subfigure represents series from a given cluster.

A first significant difference (when compared to \(k\)-means) is that cluster centers are never computed explicitly, hence time series assignments to cluster are the only kind of information available once the clustering is performed.

Second, one should note that the clusters generated by kernel-\(k\)-means are phase dependent (see clusters 2 and 3 that differ in phase rather than in shape). This is because Global Alignment Kernel is not invariant to time shifts, as demonstrated in [3] for the closely related soft-DTW [4].

Examples Using Clustering Estimators


  1. Cuturi. “Fast Global Alignment Kernels,” ICML 2011.
[2]I. S. Dhillon, Y. Guan & B. Kulis. “Kernel k-means, Spectral Clustering and Normalized Cuts,” KDD 2004.
[3]H. Janati, M. Cuturi, A. Gramfort. “Spatio-Temporal Alignments: Optimal transport through space and time,” AISTATS 2020
[4]M. Cuturi, M. Blondel “Soft-DTW: a Differentiable Loss Function for Time-Series,” ICML 2017.