k-meansΒΆ

This example uses \(k\)-means clustering for time series. Three variants of the algorithm are available: standard Euclidean \(k\)-means, DBA-\(k\)-means (for DTW Barycenter Averaging) and Soft-DTW \(k\)-means.

../_images/sphx_glr_plot_kmeans_001.png

Out:

Euclidean k-means
16.434 --> 9.437 --> 9.437 -->
DBA k-means
Init 1
1.061 --> 0.473 --> 0.473 --> 0.473 -->
Soft-DTW k-means
2.475 --> 0.158 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.157 --> 0.158 --> 0.157 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 --> 0.156 -->

# Author: Romain Tavenard
# License: BSD 3 clause

import numpy
import matplotlib.pyplot as plt

from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance, TimeSeriesResampler

seed = 0
numpy.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
X_train = X_train[y_train < 4]  # Keep first 3 classes
numpy.random.shuffle(X_train)
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train[:50])  # Keep only 50 time series
X_train = TimeSeriesResampler(sz=40).fit_transform(X_train)  # Make time series shorter
sz = X_train.shape[1]

# Euclidean k-means
print("Euclidean k-means")
km = TimeSeriesKMeans(n_clusters=3, verbose=True, random_state=seed)
y_pred = km.fit_predict(X_train)

plt.figure()
for yi in range(3):
    plt.subplot(3, 3, yi + 1)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(km.cluster_centers_[yi].ravel(), "r-")
    plt.xlim(0, sz)
    plt.ylim(-4, 4)
    if yi == 1:
        plt.title("Euclidean $k$-means")

# DBA-k-means
print("DBA k-means")
dba_km = TimeSeriesKMeans(n_clusters=3, n_init=2, metric="dtw", verbose=True, max_iter_barycenter=10, random_state=seed)
y_pred = dba_km.fit_predict(X_train)

for yi in range(3):
    plt.subplot(3, 3, 4 + yi)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(dba_km.cluster_centers_[yi].ravel(), "r-")
    plt.xlim(0, sz)
    plt.ylim(-4, 4)
    if yi == 1:
        plt.title("DBA $k$-means")

# Soft-DTW-k-means
print("Soft-DTW k-means")
sdtw_km = TimeSeriesKMeans(n_clusters=3, metric="softdtw", metric_params={"gamma_sdtw": .01},
                           verbose=True, random_state=seed)
y_pred = sdtw_km.fit_predict(X_train)

for yi in range(3):
    plt.subplot(3, 3, 7 + yi)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(sdtw_km.cluster_centers_[yi].ravel(), "r-")
    plt.xlim(0, sz)
    plt.ylim(-4, 4)
    if yi == 1:
        plt.title("Soft-DTW $k$-means")

plt.tight_layout()
plt.show()

Total running time of the script: ( 0 minutes 11.889 seconds)

Gallery generated by Sphinx-Gallery