TimeSeriesDBSCAN#

class tslearn.clustering.TimeSeriesDBSCAN(eps=0.5, min_ts=5, metric='dtw', metric_params=None, n_jobs=None)[source]#

DBSCAN clustering for time series.

Parameters:
epsfloat (default: 0.5)

The maximum distance between two time series for one to be considered as in the neighborhood of the other.

min_tsint (default: 5)

The number of time series (including itself) in a neighborhood for a time series to be considered as a core point.

metric: {‘dtw’, ‘ctw’, ‘frechet’, ‘euclidean’, ‘precomputed’} (default: ‘dtw’)

Metric to be used for similarity measure between time series.

metric_paramsdict (default: None)

Additional keyword arguments to pass to the metric function. For metrics that accept parallelization of the cross-distance matrix computations, n_jobs key passed in metric_params is overridden by the n_jobs argument. Parameters that do not match the metric computation function signature are ignored.

n_jobsint or None (default=None)

The number of jobs to run in parallel for cross-distance matrix computations. Ignored if the cross-distance matrix cannot be computed using parallelization. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learns’ Glossary for more details.

Attributes:
core_ts_indices_numpy.ndarray of shape (n_core_ts).

Indices of core time series.

components_: numpy.ndarray of shape (n_core_ts, sz, d)

Copy of each core time series found by training.

labels_numpy.ndarray of integers with shape (n_ts).

Labels of each time series. Noisy time series are given the label -1.

n_features_in_int

Number of features seen during training.

Notes

If metric is set to “euclidean”, the algorithm expects a dataset of equal-sized time series.

Examples

>>> from tslearn.generators import random_walk_blobs
>>> from tslearn.preprocessing import TimeSeriesScalerMeanVariance
>>> X, y = random_walk_blobs(n_ts_per_blob=20, sz=32, d=2, n_blobs=4, random_state=0)
>>> X = TimeSeriesScalerMeanVariance(mu=0., std=1.).fit_transform(X)
>>> db = TimeSeriesDBSCAN(eps=4, min_ts=3).fit(X)
>>> np.unique(db.labels_) # Clusters and noise
array([-1,  0,  1,  2,  3])
>>> list(db.labels_).count(-1) # Nb noisy elements
37

Methods

fit(X[, y])

Compute DBSCAN clustering.

fit_predict(X[, y])

Compute DBSCAN clustering.

from_hdf5(path)

Load model from a HDF5 file.

from_json(path)

Load model from a JSON file.

from_pickle(path)

Load model from a pickle file.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

to_hdf5(path)

Save model to a HDF5 file.

to_json(path)

Save model to a JSON file.

to_pickle(path)

Save model to a pickle file.

fit(X, y=None)[source]#

Compute DBSCAN clustering.

Parameters:
Xarray-like of shape=(n_ts, sz, d)

Time series dataset.

y

Ignored

Returns:
TimeSeriesDBSCAN

The fitted estimator

fit_predict(X, y=None)[source]#

Compute DBSCAN clustering.

Parameters:
Xarray-like of shape (n_ts, sz, d)

Time series dataset.

yIgnored

Not used, present here for API consistency by convention.

Returns:
labelsarray of shape=(n_ts)

Index of the cluster each TS belongs to. Noisy TS are given the label -1.

classmethod from_hdf5(path)[source]#

Load model from a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:
pathstr

Full path to file.

Returns:
Model instance
classmethod from_json(path)[source]#

Load model from a JSON file.

Parameters:
pathstr

Full path to file.

Returns:
Model instance
classmethod from_pickle(path)[source]#

Load model from a pickle file.

Parameters:
pathstr

Full path to file.

Returns:
Model instance
get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

to_hdf5(path)[source]#

Save model to a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:
pathstr

Full file path. File must not already exist.

Raises:
FileExistsError

If a file with the same path already exists.

to_json(path)[source]#

Save model to a JSON file.

Parameters:
pathstr

Full file path.

to_pickle(path)[source]#

Save model to a pickle file.

Parameters:
pathstr

Full file path.

Examples using tslearn.clustering.TimeSeriesDBSCAN#

DBSCAN

DBSCAN