tslearn.clustering
.KShape¶
- class tslearn.clustering.KShape(n_clusters=3, max_iter=100, tol=1e-06, n_init=1, verbose=False, random_state=None, init='random')[source]¶
KShape clustering for time series.
KShape was originally presented in [1].
- Parameters:
- n_clustersint (default: 3)
Number of clusters to form.
- max_iterint (default: 100)
Maximum number of iterations of the k-Shape algorithm.
- tolfloat (default: 1e-6)
Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops.
- n_initint (default: 1)
Number of time the k-Shape algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
- verbosebool (default: False)
Whether or not to print information about the inertia while learning the model.
- random_stateinteger or numpy.RandomState, optional
Generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
- init{‘random’ or ndarray} (default: ‘random’)
Method for initialization. ‘random’: choose k observations (rows) at random from data for the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, ts_size, d) and gives the initial centers.
- Attributes:
- cluster_centers_numpy.ndarray of shape (sz, d).
Centroids
- labels_numpy.ndarray of integers with shape (n_ts, ).
Labels of each point
- inertia_float
Sum of distances of samples to their closest cluster center.
- n_iter_int
The number of iterations performed during fit.
Notes
This method requires a dataset of equal-sized time series.
References
[1]J. Paparrizos & L. Gravano. k-Shape: Efficient and Accurate Clustering of Time Series. SIGMOD 2015. pp. 1855-1870.
Examples
>>> from tslearn.generators import random_walks >>> X = random_walks(n_ts=50, sz=32, d=1) >>> X = TimeSeriesScalerMeanVariance(mu=0., std=1.).fit_transform(X) >>> ks = KShape(n_clusters=3, n_init=1, random_state=0).fit(X) >>> ks.cluster_centers_.shape (3, 32, 1)
Methods
fit
(X[, y])Compute k-Shape clustering.
fit_predict
(X[, y])Fit k-Shape clustering using X and then predict the closest cluster each time series in X belongs to.
from_hdf5
(path)Load model from a HDF5 file.
from_json
(path)Load model from a JSON file.
from_pickle
(path)Load model from a pickle file.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict the closest cluster each time series in X belongs to.
set_params
(**params)Set the parameters of this estimator.
to_hdf5
(path)Save model to a HDF5 file.
to_json
(path)Save model to a JSON file.
to_pickle
(path)Save model to a pickle file.
- fit(X, y=None)[source]¶
Compute k-Shape clustering.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset.
- y
Ignored
- fit_predict(X, y=None)[source]¶
Fit k-Shape clustering using X and then predict the closest cluster each time series in X belongs to.
It is more efficient to use this method than to sequentially call fit and predict.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset to predict.
- y
Ignored
- Returns:
- labelsarray of shape=(n_ts, )
Index of the cluster each sample belongs to.
- classmethod from_hdf5(path)[source]¶
Load model from a HDF5 file. Requires
h5py
http://docs.h5py.org/- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- classmethod from_json(path)[source]¶
Load model from a JSON file.
- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- classmethod from_pickle(path)[source]¶
Load model from a pickle file.
- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Predict the closest cluster each time series in X belongs to.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset to predict.
- Returns:
- labelsarray of shape=(n_ts, )
Index of the cluster each sample belongs to.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- to_hdf5(path)[source]¶
Save model to a HDF5 file. Requires
h5py
http://docs.h5py.org/- Parameters:
- pathstr
Full file path. File must not already exist.
- Raises:
- FileExistsError
If a file with the same path already exists.