Note
Go to the end to download the full example code
Nearest neighbors¶
This example illustrates the use of nearest neighbor methods for database search and classification tasks.
The three-nearest neighbors of the time series from a test set are computed. Then, the predictive performance of a three-nearest neighbors classifier [1] is computed with three different metrics: Dynamic Time Warping [2], Euclidean distance and SAX-MINDIST [3].
[1] Wikipedia entry for the k-nearest neighbors algorithm
[2] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
[3] J. Lin, E. Keogh, L. Wei and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series”. Data Mining and Knowledge Discovery, 15(2), 107-144 (2007).
1. Nearest neighbour search
Computed nearest neighbor indices (wrt DTW)
[[10 12 2]
[ 0 13 5]
[ 0 1 13]
[ 0 11 5]
[16 18 12]
[ 3 17 9]
[12 2 16]
[ 7 3 17]
[12 2 10]
[12 2 18]
[12 8 2]
[ 3 17 7]
[18 19 2]
[ 0 17 13]
[ 9 3 7]
[12 2 8]
[ 3 7 9]
[ 0 1 13]
[18 10 2]
[10 12 2]]
First nearest neighbor class: [0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
2. Nearest neighbor classification using DTW
Correct classification rate: 1.0
3. Nearest neighbor classification using L2
Correct classification rate: 1.0
4. Nearest neighbor classification using SAX+MINDIST
Correct classification rate: 0.5
# Author: Romain Tavenard
# License: BSD 3 clause
import numpy
from sklearn.metrics import accuracy_score
from tslearn.generators import random_walk_blobs
from tslearn.preprocessing import TimeSeriesScalerMinMax, \
TimeSeriesScalerMeanVariance
from tslearn.neighbors import KNeighborsTimeSeriesClassifier, \
KNeighborsTimeSeries
numpy.random.seed(0)
n_ts_per_blob, sz, d, n_blobs = 20, 100, 1, 2
# Prepare data
X, y = random_walk_blobs(n_ts_per_blob=n_ts_per_blob,
sz=sz,
d=d,
n_blobs=n_blobs)
scaler = TimeSeriesScalerMinMax(value_range=(0., 1.)) # Rescale time series
X_scaled = scaler.fit_transform(X)
indices_shuffle = numpy.random.permutation(n_ts_per_blob * n_blobs)
X_shuffle = X_scaled[indices_shuffle]
y_shuffle = y[indices_shuffle]
X_train = X_shuffle[:n_ts_per_blob * n_blobs // 2]
X_test = X_shuffle[n_ts_per_blob * n_blobs // 2:]
y_train = y_shuffle[:n_ts_per_blob * n_blobs // 2]
y_test = y_shuffle[n_ts_per_blob * n_blobs // 2:]
# Nearest neighbor search
knn = KNeighborsTimeSeries(n_neighbors=3, metric="dtw")
knn.fit(X_train, y_train)
dists, ind = knn.kneighbors(X_test)
print("1. Nearest neighbour search")
print("Computed nearest neighbor indices (wrt DTW)\n", ind)
print("First nearest neighbor class:", y_test[ind[:, 0]])
# Nearest neighbor classification
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="dtw")
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n2. Nearest neighbor classification using DTW")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
# Nearest neighbor classification with a different metric (Euclidean distance)
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="euclidean")
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n3. Nearest neighbor classification using L2")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
# Nearest neighbor classification based on SAX representation
metric_params = {'n_segments': 10, 'alphabet_size_avg': 5}
knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="sax",
metric_params=metric_params)
knn_clf.fit(X_train, y_train)
predicted_labels = knn_clf.predict(X_test)
print("\n4. Nearest neighbor classification using SAX+MINDIST")
print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
Total running time of the script: (0 minutes 1.203 seconds)