KernelKMeans#

class tslearn.clustering.KernelKMeans(n_clusters=3, kernel='gak', max_iter=50, tol=1e-06, n_init=1, kernel_params=None, n_jobs=None, verbose=0, random_state=None)[source]#

Kernel K-means.

Parameters:

n_clustersint (default: 3)

Number of clusters to form.

kernelstring, or callable (default: “gak”)

The kernel should either be “gak”, in which case the Global Alignment Kernel from [2] is used or a value that is accepted as a metric by scikit-learn’s pairwise_kernels

max_iterint (default: 50)

Maximum number of iterations of the k-means algorithm for a single run.

tolfloat (default: 1e-6)

Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops.

n_initint (default: 1)

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

kernel_paramsdict or None (default: None)

Kernel parameters to be passed to the kernel function. None means no kernel parameter is set.

For Global Alignment Kernel, the only parameter of interest is sigma. If set to ‘auto’, it is computed based on a sampling of the training set (cf tslearn.metrics.sigma_gak). If no specific value is set for sigma, its defaults to 1. A RuntimeError is raised at fit time when computed or explicit value is close to 0 and therefore not compatible with ‘gak’ kernel.

n_jobsint or None, optional (default=None)

The number of jobs to run in parallel for GAK cross-similarity matrix computations. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learns’ Glossary for more details.

verboseint (default: 0)

If nonzero, joblib progress messages are printed.

random_stateinteger or numpy.RandomState, optional

Generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

Attributes:

labels_numpy.ndarray: Labels of each point
inertia_float: Sum of distances of samples to their closest cluster center (computed using the kernel trick).
sample_weight_numpy.ndarray: The weight given to each sample from the data provided to fit.
n_iter_int: The number of iterations performed during fit.

Notes

The training data are saved to disk if this model is serialized and may result in a large model file if the training dataset is large.

References

[1]

Kernel k-means, Spectral Clustering and Normalized Cuts. Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis. KDD 2004.

[2]

Fast Global Alignment Kernels. Marco Cuturi. ICML 2011.

Examples

>>> from tslearn.generators import random_walks
>>> X = random_walks(n_ts=50, sz=32, d=1)
>>> gak_km = KernelKMeans(n_clusters=3, kernel="gak", random_state=0)
>>> gak_km.fit(X)
KernelKMeans(...)
>>> print(numpy.unique(gak_km.labels_))
[0 1 2]

Methods

`fit`(X[, y, sample_weight])	Compute kernel k-means clustering.
`fit_predict`(X[, y])	Fit kernel k-means clustering using X and then predict the closest cluster each time series in X belongs to.
`from_hdf5`(path)	Load model from a HDF5 file.
`from_json`(path)	Load model from a JSON file.
`from_pickle`(path)	Load model from a pickle file.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the closest cluster each time series in X belongs to.
`set_fit_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.
`to_hdf5`(path)	Save model to a HDF5 file.
`to_json`(path)	Save model to a JSON file.
`to_pickle`(path)	Save model to a pickle file.

fit(X, y=None, sample_weight=None)[source]#

Compute kernel k-means clustering.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset.
y: Ignored
sample_weightarray-like of shape=(n_ts, ) or None (default: None): Weights to be given to time series in the learning process. By default, all time series weights are equal.

fit_predict(X, y=None)[source]#

Fit kernel k-means clustering using X and then predict the closest cluster each time series in X belongs to.

It is more efficient to use this method than to sequentially call fit and predict.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset to predict.
y: Ignored

Returns:

labelsarray of shape=(n_ts, ): Index of the cluster each sample belongs to.

classmethod from_hdf5(path)[source]#

Load model from a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:

pathstr: Full path to file.

Returns:

Model instance

classmethod from_json(path)[source]#

Load model from a JSON file.

Parameters:

pathstr: Full path to file.

Returns:

Model instance

classmethod from_pickle(path)[source]#

Load model from a pickle file.

Parameters:

pathstr: Full path to file.

Returns:

Model instance

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict the closest cluster each time series in X belongs to.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset to predict.

Returns:

labelsarray of shape=(n_ts, ): Index of the cluster each sample belongs to.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KernelKMeans#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit.

Returns:

selfobject: The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

to_hdf5(path)[source]#

Save model to a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:

pathstr: Full file path. File must not already exist.

Raises:

FileExistsError: If a file with the same path already exists.

to_json(path)[source]#

Save model to a JSON file.

Parameters:

pathstr: Full file path.

to_pickle(path)[source]#

Save model to a pickle file.

Parameters:

pathstr: Full file path.

Examples using `tslearn.clustering.KernelKMeans`#

Kernel k-means

KernelKMeans#

Examples using tslearn.clustering.KernelKMeans#

Examples using `tslearn.clustering.KernelKMeans`#