tslearn.clustering.TimeSeriesKMeans

class tslearn.clustering.TimeSeriesKMeans(n_clusters=3, max_iter=50, tol=1e-06, n_init=1, metric='euclidean', max_iter_barycenter=100, metric_params=None, n_jobs=None, dtw_inertia=False, verbose=0, random_state=None, init='k-means++')[source]

K-means clustering for time-series data.

Parameters:
n_clusters : int (default: 3)

Number of clusters to form.

max_iter : int (default: 50)

Maximum number of iterations of the k-means algorithm for a single run.

tol : float (default: 1e-6)

Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops.

n_init : int (default: 1)

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

metric : {“euclidean”, “dtw”, “softdtw”} (default: “euclidean”)

Metric to be used for both cluster assignment and barycenter computation. If “dtw”, DBA is used for barycenter computation.

max_iter_barycenter : int (default: 100)

Number of iterations for the barycenter computation process. Only used if metric=”dtw” or metric=”softdtw”.

metric_params : dict or None (default: None)

Parameter values for the chosen metric. For metrics that accept parallelization of the cross-distance matrix computations, n_jobs key passed in metric_params is overridden by the n_jobs argument.

n_jobs : int or None, optional (default=None)

The number of jobs to run in parallel for cross-distance matrix computations. Ignored if the cross-distance matrix cannot be computed using parallelization. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learns’ Glossary for more details.

dtw_inertia: bool (default: False)

Whether to compute DTW inertia even if DTW is not the chosen metric.

verbose : int (default: 0)

If nonzero, print information about the inertia while learning the model and joblib progress messages are printed.

random_state : integer or numpy.RandomState, optional

Generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

init : {‘k-means++’, ‘random’ or an ndarray} (default: ‘k-means++’)

Method for initialization: ‘k-means++’ : use k-means++ heuristic. See scikit-learn’s k_init_ for more. ‘random’: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, ts_size, d) and gives the initial centers.

Attributes:
labels_ : numpy.ndarray

Labels of each point.

cluster_centers_ : numpy.ndarray of shape (n_clusters, sz, d)

Cluster centers. sz is the size of the time series used at fit time if the init method is ‘k-means++’ or ‘random’, and the size of the longest initial centroid if those are provided as a numpy array through init parameter.

inertia_ : float

Sum of distances of samples to their closest cluster center.

n_iter_ : int

The number of iterations performed during fit.

Notes

If metric is set to “euclidean”, the algorithm expects a dataset of equal-sized time series.

Examples

>>> from tslearn.generators import random_walks
>>> X = random_walks(n_ts=50, sz=32, d=1)
>>> km = TimeSeriesKMeans(n_clusters=3, metric="euclidean", max_iter=5,
...                       random_state=0).fit(X)
>>> km.cluster_centers_.shape
(3, 32, 1)
>>> km_dba = TimeSeriesKMeans(n_clusters=3, metric="dtw", max_iter=5,
...                           max_iter_barycenter=5,
...                           random_state=0).fit(X)
>>> km_dba.cluster_centers_.shape
(3, 32, 1)
>>> km_sdtw = TimeSeriesKMeans(n_clusters=3, metric="softdtw", max_iter=5,
...                            max_iter_barycenter=5,
...                            metric_params={"gamma": .5},
...                            random_state=0).fit(X)
>>> km_sdtw.cluster_centers_.shape
(3, 32, 1)
>>> X_bis = to_time_series_dataset([[1, 2, 3, 4],
...                                 [1, 2, 3],
...                                 [2, 5, 6, 7, 8, 9]])
>>> km = TimeSeriesKMeans(n_clusters=2, max_iter=5,
...                       metric="dtw", random_state=0).fit(X_bis)
>>> km.cluster_centers_.shape
(2, 6, 1)

Methods

fit(self, X[, y]) Compute k-means clustering.
fit_predict(self, X[, y]) Fit k-means clustering using X and then predict the closest cluster each time series in X belongs to.
fit_transform(self, X[, y]) Fit to data, then transform it.
from_hdf5(path) Load model from a HDF5 file.
from_json(path) Load model from a JSON file.
from_pickle(path) Load model from a pickle file.
get_params(self[, deep]) Get parameters for this estimator.
predict(self, X) Predict the closest cluster each time series in X belongs to.
set_params(self, **params) Set the parameters of this estimator.
to_hdf5(self, path) Save model to a HDF5 file.
to_json(self, path) Save model to a JSON file.
to_pickle(self, path) Save model to a pickle file.
transform(self, X) Transform X to a cluster-distance space.
fit(self, X, y=None)[source]

Compute k-means clustering.

Parameters:
X : array-like of shape=(n_ts, sz, d)

Time series dataset.

y

Ignored

fit_predict(self, X, y=None)[source]

Fit k-means clustering using X and then predict the closest cluster each time series in X belongs to.

It is more efficient to use this method than to sequentially call fit and predict.

Parameters:
X : array-like of shape=(n_ts, sz, d)

Time series dataset to predict.

y

Ignored

Returns:
labels : array of shape=(n_ts, )

Index of the cluster each sample belongs to.

fit_transform(self, X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y : ndarray of shape (n_samples,), default=None

Target values.

**fit_params : dict

Additional fit parameters.

Returns:
X_new : ndarray array of shape (n_samples, n_features_new)

Transformed array.

classmethod from_hdf5(path)[source]

Load model from a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:
path : str

Full path to file.

Returns:
Model instance
classmethod from_json(path)[source]

Load model from a JSON file.

Parameters:
path : str

Full path to file.

Returns:
Model instance
classmethod from_pickle(path)[source]

Load model from a pickle file.

Parameters:
path : str

Full path to file.

Returns:
Model instance
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

predict(self, X)[source]

Predict the closest cluster each time series in X belongs to.

Parameters:
X : array-like of shape=(n_ts, sz, d)

Time series dataset to predict.

Returns:
labels : array of shape=(n_ts, )

Index of the cluster each sample belongs to.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.

to_hdf5(self, path)[source]

Save model to a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:
path : str

Full file path. File must not already exist.

Raises:
FileExistsError

If a file with the same path already exists.

to_json(self, path)[source]

Save model to a JSON file.

Parameters:
path : str

Full file path.

to_pickle(self, path)[source]

Save model to a pickle file.

Parameters:
path : str

Full file path.

transform(self, X)[source]

Transform X to a cluster-distance space.

In the new space, each dimension is the distance to the cluster centers.

Parameters:
X : array-like of shape=(n_ts, sz, d)

Time series dataset

Returns:
distances : array of shape=(n_ts, n_clusters)

Distances to cluster centers

Examples using tslearn.clustering.TimeSeriesKMeans