LearningShapelets#

class tslearn.shapelets.LearningShapelets(n_shapelets_per_size=None, max_iter=10000, batch_size=256, verbose=0, optimizer='sgd', weight_regularizer=0.0, shapelet_length=0.15, total_lengths=3, max_size=None, scale=False, random_state=None)[source]#

Learning Time-Series Shapelets model.

Learning Time-Series Shapelets was originally presented in [1].

From an input (possibly multidimensional) time series $x$ and a set of shapelets $\{s_i\}_i$, the $i$-th coordinate of the Shapelet transform is computed as:

\[ST(x, s_i) = \min_t \sum_{\delta_t} \left\|x(t+\delta_t) - s_i(\delta_t)\right\|_2^2\]

The Shapelet model consists in a logistic regression layer on top of this transform. Shapelet coefficients as well as logistic regression weights are optimized by gradient descent on a L2-penalized cross-entropy loss.

Parameters:

n_shapelets_per_size: dict (default: None): Dictionary giving, for each shapelet size (key), the number of such shapelets to be trained (value). If None, grabocka_params_to_shapelet_size_dict() is used and the size used to compute is that of the shortest time series passed at fit time.
max_iter: int (default: 10,000): Number of training epochs.

Changed in version 0.3: default value for max_iter is set to 10,000 instead of 100
batch_size: int (default: 256): Batch size to be used.
verbose: {0, 1, 2} (default: 0): keras verbose level.
optimizer: str or keras.optimizers.Optimizer (default: “sgd”): keras optimizer to use for training.
weight_regularizer: float (default: 0.): Strength of the L2 regularizer to use for training the classification (softmax) layer. If 0, no regularization is performed.
shapelet_length: float (default: 0.15): The length of the shapelets, expressed as a fraction of the time series length. Used only if n_shapelets_per_size is None.
total_lengths: int (default: 3): The number of different shapelet lengths. Will extract shapelets of length i * shapelet_length for i in [1, total_lengths] Used only if n_shapelets_per_size is None.
max_size: int or None (default: None): Maximum size for time series to be fed to the model. If None, it is set to the size (number of timestamps) of the training time series.
scale: bool (default: False): Whether input data should be scaled for each feature of each time series to lie in the [0-1] interval. Default for this parameter is set to False in version 0.4 to ensure backward compatibility, but is likely to change in a future version.
random_stateint or None, optional (default: None): The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes:

shapelets_numpy.ndarray of objects, each object being a time series: Set of time-series shapelets.
shapelets_as_time_series_numpy.ndarray of shape (n_shapelets, sz_shp, d): Set of time-series shapelets formatted as a tslearn time series dataset.
where `sz_shp` is the maximum of all shapelet sizes: Set of time-series shapelets formatted as a tslearn time series dataset.
transformer_model_keras.Model: Transforms an input dataset of timeseries into distances to the learned shapelets.
locator_model_keras.Model: Returns the indices where each of the shapelets can be found (minimal distance) within each of the timeseries of the input dataset.
model_keras.Model: Directly predicts the class probabilities for the input timeseries.
history_dict: Dictionary of losses and metrics recorded during fit.

Notes

This model does not support HDF5 serialization.

References

[1]

J. Grabocka et al. Learning Time-Series Shapelets. SIGKDD 2014.

Examples

>>> from tslearn.generators import random_walk_blobs
>>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=16, d=2, n_blobs=3)
>>> clf = LearningShapelets(n_shapelets_per_size={4: 5},
...                         max_iter=1, verbose=0)
>>> clf.fit(X, y).shapelets_.shape
(5,)
>>> clf.shapelets_[0].shape
(4, 2)
>>> clf.predict(X).shape
(30,)
>>> clf.predict_proba(X).shape
(30, 3)
>>> clf.transform(X).shape
(30, 5)

Methods

`fit`(X, y)	Learn time-series shapelets.
`fit_transform`(X[, y])	Fit to data, then transform it.
`from_hdf5`(path)	Load model from a HDF5 file.
`from_json`(path)	Load model from a JSON file.
`from_pickle`(path)	Load model from a pickle file.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`get_weights`([layer_name])	Return model weights (or weights for a given layer if layer_name is provided).
`locate`(X)	Compute shapelet match location for a set of time series.
`predict`(X)	Predict class for a given set of time series.
`predict_proba`(X)	Predict class probability for a given set of time series.
`score`(X, y[, sample_weight])	Return accuracy on provided data and labels.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.
`set_weights`(weights[, layer_name])	Set model weights (or weights for a given layer if layer_name is provided).
`to_hdf5`(path)	LearningShapelet is not HDF5 serializable
`to_json`(path)	Save model to a JSON file.
`to_pickle`(path)	Save model to a pickle file.
`transform`(X)	Generate shapelet transform for a set of time series.

fit(X, y)[source]#

Learn time-series shapelets.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset.
yarray-like of shape=(n_ts, ): Time series labels.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

classmethod from_hdf5(path)[source]#

Load model from a HDF5 file. Requires h5py http://docs.h5py.org/

Parameters:

pathstr: Full path to file.

Returns:

Model instance

classmethod from_json(path)[source]#

Load model from a JSON file.

Parameters:

pathstr: Full path to file.

Returns:

Model instance

classmethod from_pickle(path)[source]#

Load model from a pickle file.

Parameters:

pathstr: Full path to file.

Returns:

Model instance

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_weights(layer_name=None)[source]#

Return model weights (or weights for a given layer if layer_name is provided).

Parameters:

layer_name: str or None (default: None)

Name of the layer for which weights should be returned. If None, all model weights are returned. Available layer names with weights are:

“shapelets_i_j” with i an integer for the shapelet id and j an integer for the dimension
“classification” for the final classification layer

Returns:

list: list of model (or layer) weights

Examples

>>> from tslearn.generators import random_walk_blobs
>>> X, y = random_walk_blobs(n_ts_per_blob=100, sz=256, d=1, n_blobs=3)
>>> clf = LearningShapelets(n_shapelets_per_size={10: 5}, max_iter=1,
...                     verbose=0)
>>> clf.fit(X, y).get_weights("classification")[0].shape
(5, 3)
>>> clf.get_weights("shapelets_0")[0].shape
(5, 10, 1)
>>> len(clf.get_weights("shapelets_0"))
1

locate(X)[source]#

Compute shapelet match location for a set of time series.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset.

Returns:

array of shape=(n_ts, n_shapelets): Location of the shapelet matches for the provided time series.

Examples

>>> from tslearn.generators import random_walk_blobs
>>> X = numpy.zeros((3, 10, 1))
>>> X[0, 4:7, 0] = numpy.array([1, 2, 3])
>>> y = [1, 0, 0]
>>> # Data is all zeros except a motif 1-2-3 in the first time series
>>> clf = LearningShapelets(n_shapelets_per_size={3: 1}, max_iter=1,
...                     verbose=0)
>>> _ = clf.fit(X, y)
>>> weights_shapelet = [
...     numpy.array([[[1], [2], [3]]])
... ]
>>> clf.set_weights(weights_shapelet, layer_name="shapelets_0")
>>> clf.locate(X)
array([[4],
       [0],
       [0]])

predict(X)[source]#

Predict class for a given set of time series.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset.

Returns:

array of shape=(n_ts, ) or (n_ts, n_classes), depending on the shape
of the label vector provided at training time.: Index of the cluster each sample belongs to or class probability matrix, depending on what was provided at training time.

predict_proba(X)[source]#

Predict class probability for a given set of time series.

Parameters:

Xarray-like of shape=(n_ts, sz, d): Time series dataset.

Returns:

array of shape=(n_ts, n_classes),: Class probability matrix.

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LearningShapelets#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.

set_weights(weights, layer_name=None)[source]#

Set model weights (or weights for a given layer if layer_name is provided).

Parameters:

weights: list of ndarrays

Weights to set for the model / target layer

layer_name: str or None (default: None)

Name of the layer for which weights should be set. If None, all model weights are set. Available layer names with weights are:

“shapelets_i_j” with i an integer for the shapelet id and j an integer for the dimension
“classification” for the final classification layer

Examples

>>> from tslearn.generators import random_walk_blobs
>>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=16, d=1, n_blobs=3)
>>> clf = LearningShapelets(n_shapelets_per_size={3: 1}, max_iter=1,
...                     verbose=0)
>>> _ = clf.fit(X, y)
>>> weights_shapelet = [
...     numpy.array([[[1], [2], [3]]])
... ]
>>> clf.set_weights(weights_shapelet, layer_name="shapelets_0")
>>> clf.shapelets_as_time_series_
array([[[1.],
        [2.],
        [3.]]])

property shapelets_as_time_series_[source]#

Set of time-series shapelets formatted as a tslearn time series dataset.

Examples

>>> from tslearn.generators import random_walk_blobs
>>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=256, d=1, n_blobs=3)
>>> model = LearningShapelets(n_shapelets_per_size={3: 2, 4: 1},
...                       max_iter=1)
>>> _ = model.fit(X, y)
>>> model.shapelets_as_time_series_.shape
(3, 4, 1)