tslearn.clustering
.TimeSeriesKMeans¶
-
class
tslearn.clustering.
TimeSeriesKMeans
(n_clusters=3, max_iter=50, tol=1e-06, n_init=1, metric='euclidean', max_iter_barycenter=100, metric_params=None, n_jobs=None, dtw_inertia=False, verbose=0, random_state=None, init='k-means++')[source]¶ K-means clustering for time-series data.
Parameters: - n_clusters : int (default: 3)
Number of clusters to form.
- max_iter : int (default: 50)
Maximum number of iterations of the k-means algorithm for a single run.
- tol : float (default: 1e-6)
Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops.
- n_init : int (default: 1)
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
- metric : {“euclidean”, “dtw”, “softdtw”} (default: “euclidean”)
Metric to be used for both cluster assignment and barycenter computation. If “dtw”, DBA is used for barycenter computation.
- max_iter_barycenter : int (default: 100)
Number of iterations for the barycenter computation process. Only used if metric=”dtw” or metric=”softdtw”.
- metric_params : dict or None (default: None)
Parameter values for the chosen metric. For metrics that accept parallelization of the cross-distance matrix computations, n_jobs key passed in metric_params is overridden by the n_jobs argument.
- n_jobs : int or None, optional (default=None)
The number of jobs to run in parallel for cross-distance matrix computations. Ignored if the cross-distance matrix cannot be computed using parallelization.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See scikit-learns’ Glossary for more details.- dtw_inertia: bool (default: False)
Whether to compute DTW inertia even if DTW is not the chosen metric.
- verbose : int (default: 0)
If nonzero, print information about the inertia while learning the model and joblib progress messages are printed.
- random_state : integer or numpy.RandomState, optional
Generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
- init : {‘k-means++’, ‘random’ or an ndarray} (default: ‘k-means++’)
Method for initialization: ‘k-means++’ : use k-means++ heuristic. See scikit-learn’s k_init_ for more. ‘random’: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, ts_size, d) and gives the initial centers.
Attributes: - labels_ : numpy.ndarray
Labels of each point.
- cluster_centers_ : numpy.ndarray of shape (n_clusters, sz, d)
Cluster centers. sz is the size of the time series used at fit time if the init method is ‘k-means++’ or ‘random’, and the size of the longest initial centroid if those are provided as a numpy array through init parameter.
- inertia_ : float
Sum of distances of samples to their closest cluster center.
- n_iter_ : int
The number of iterations performed during fit.
Notes
If metric is set to “euclidean”, the algorithm expects a dataset of equal-sized time series.
Examples
>>> from tslearn.generators import random_walks >>> X = random_walks(n_ts=50, sz=32, d=1) >>> km = TimeSeriesKMeans(n_clusters=3, metric="euclidean", max_iter=5, ... random_state=0).fit(X) >>> km.cluster_centers_.shape (3, 32, 1) >>> km_dba = TimeSeriesKMeans(n_clusters=3, metric="dtw", max_iter=5, ... max_iter_barycenter=5, ... random_state=0).fit(X) >>> km_dba.cluster_centers_.shape (3, 32, 1) >>> km_sdtw = TimeSeriesKMeans(n_clusters=3, metric="softdtw", max_iter=5, ... max_iter_barycenter=5, ... metric_params={"gamma": .5}, ... random_state=0).fit(X) >>> km_sdtw.cluster_centers_.shape (3, 32, 1) >>> X_bis = to_time_series_dataset([[1, 2, 3, 4], ... [1, 2, 3], ... [2, 5, 6, 7, 8, 9]]) >>> km = TimeSeriesKMeans(n_clusters=2, max_iter=5, ... metric="dtw", random_state=0).fit(X_bis) >>> km.cluster_centers_.shape (2, 6, 1)
Methods
fit
(X[, y])Compute k-means clustering. fit_predict
(X[, y])Fit k-means clustering using X and then predict the closest cluster each time series in X belongs to. fit_transform
(X[, y])Fit to data, then transform it. from_hdf5
(path)Load model from a HDF5 file. from_json
(path)Load model from a JSON file. from_pickle
(path)Load model from a pickle file. get_params
([deep])Get parameters for this estimator. predict
(X)Predict the closest cluster each time series in X belongs to. set_params
(**params)Set the parameters of this estimator. to_hdf5
(path)Save model to a HDF5 file. to_json
(path)Save model to a JSON file. to_pickle
(path)Save model to a pickle file. transform
(X)Transform X to a cluster-distance space. -
fit
(X, y=None)[source]¶ Compute k-means clustering.
Parameters: - X : array-like of shape=(n_ts, sz, d)
Time series dataset.
- y
Ignored
-
fit_predict
(X, y=None)[source]¶ Fit k-means clustering using X and then predict the closest cluster each time series in X belongs to.
It is more efficient to use this method than to sequentially call fit and predict.
Parameters: - X : array-like of shape=(n_ts, sz, d)
Time series dataset to predict.
- y
Ignored
Returns: - labels : array of shape=(n_ts, )
Index of the cluster each sample belongs to.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X : array-like of shape (n_samples, n_features)
Input samples.
- y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_params : dict
Additional fit parameters.
Returns: - X_new : ndarray array of shape (n_samples, n_features_new)
Transformed array.
-
classmethod
from_hdf5
(path)[source]¶ Load model from a HDF5 file. Requires
h5py
http://docs.h5py.org/Parameters: - path : str
Full path to file.
Returns: - Model instance
-
classmethod
from_json
(path)[source]¶ Load model from a JSON file.
Parameters: - path : str
Full path to file.
Returns: - Model instance
-
classmethod
from_pickle
(path)[source]¶ Load model from a pickle file.
Parameters: - path : str
Full path to file.
Returns: - Model instance
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : dict
Parameter names mapped to their values.
-
predict
(X)[source]¶ Predict the closest cluster each time series in X belongs to.
Parameters: - X : array-like of shape=(n_ts, sz, d)
Time series dataset to predict.
Returns: - labels : array of shape=(n_ts, )
Index of the cluster each sample belongs to.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: - self : estimator instance
Estimator instance.
-
to_hdf5
(path)[source]¶ Save model to a HDF5 file. Requires
h5py
http://docs.h5py.org/Parameters: - path : str
Full file path. File must not already exist.
Raises: - FileExistsError
If a file with the same path already exists.