LearningShapelets#
- class tslearn.shapelets.LearningShapelets(n_shapelets_per_size=None, max_iter=10000, batch_size=256, verbose=0, optimizer='sgd', weight_regularizer=0.0, shapelet_length=0.15, total_lengths=3, max_size=None, scale=False, random_state=None)[source]#
Learning Time-Series Shapelets model.
Learning Time-Series Shapelets was originally presented in [1].
From an input (possibly multidimensional) time series \(x\) and a set of shapelets \(\{s_i\}_i\), the \(i\)-th coordinate of the Shapelet transform is computed as:
\[ST(x, s_i) = \min_t \sum_{\delta_t} \left\|x(t+\delta_t) - s_i(\delta_t)\right\|_2^2\]The Shapelet model consists in a logistic regression layer on top of this transform. Shapelet coefficients as well as logistic regression weights are optimized by gradient descent on a L2-penalized cross-entropy loss.
- Parameters:
- n_shapelets_per_size: dict (default: None)
Dictionary giving, for each shapelet size (key), the number of such shapelets to be trained (value). If None,
grabocka_params_to_shapelet_size_dict()is used and the size used to compute is that of the shortest time series passed at fit time.- max_iter: int (default: 10,000)
Number of training epochs.
Changed in version 0.3: default value for max_iter is set to 10,000 instead of 100
- batch_size: int (default: 256)
Batch size to be used.
- verbose: {0, 1, 2} (default: 0)
keras verbose level.
- optimizer: str or keras.optimizers.Optimizer (default: “sgd”)
keras optimizer to use for training.
- weight_regularizer: float (default: 0.)
Strength of the L2 regularizer to use for training the classification (softmax) layer. If 0, no regularization is performed.
- shapelet_length: float (default: 0.15)
The length of the shapelets, expressed as a fraction of the time series length. Used only if n_shapelets_per_size is None.
- total_lengths: int (default: 3)
The number of different shapelet lengths. Will extract shapelets of length i * shapelet_length for i in [1, total_lengths] Used only if n_shapelets_per_size is None.
- max_size: int or None (default: None)
Maximum size for time series to be fed to the model. If None, it is set to the size (number of timestamps) of the training time series.
- scale: bool (default: False)
Whether input data should be scaled for each feature of each time series to lie in the [0-1] interval. Default for this parameter is set to False in version 0.4 to ensure backward compatibility, but is likely to change in a future version.
- random_stateint or None, optional (default: None)
The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- Attributes:
- shapelets_numpy.ndarray of objects, each object being a time series
Set of time-series shapelets.
shapelets_as_time_series_numpy.ndarray of shape (n_shapelets, sz_shp, d)Set of time-series shapelets formatted as a
tslearntime series dataset.- where `sz_shp` is the maximum of all shapelet sizes
Set of time-series shapelets formatted as a
tslearntime series dataset.- transformer_model_keras.Model
Transforms an input dataset of timeseries into distances to the learned shapelets.
- locator_model_keras.Model
Returns the indices where each of the shapelets can be found (minimal distance) within each of the timeseries of the input dataset.
- model_keras.Model
Directly predicts the class probabilities for the input timeseries.
- history_dict
Dictionary of losses and metrics recorded during fit.
Notes
This model does not support HDF5 serialization.
References
[1]J. Grabocka et al. Learning Time-Series Shapelets. SIGKDD 2014.
Examples
>>> from tslearn.generators import random_walk_blobs >>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=16, d=2, n_blobs=3) >>> clf = LearningShapelets(n_shapelets_per_size={4: 5}, ... max_iter=1, verbose=0) >>> clf.fit(X, y).shapelets_.shape (5,) >>> clf.shapelets_[0].shape (4, 2) >>> clf.predict(X).shape (30,) >>> clf.predict_proba(X).shape (30, 3) >>> clf.transform(X).shape (30, 5)
Methods
fit(X, y)Learn time-series shapelets.
fit_transform(X[, y])Fit to data, then transform it.
from_hdf5(path)Load model from a HDF5 file.
from_json(path)Load model from a JSON file.
from_pickle(path)Load model from a pickle file.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
get_weights([layer_name])Return model weights (or weights for a given layer if layer_name is provided).
locate(X)Compute shapelet match location for a set of time series.
predict(X)Predict class for a given set of time series.
Predict class probability for a given set of time series.
score(X, y[, sample_weight])Return accuracy on provided data and labels.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.set_weights(weights[, layer_name])Set model weights (or weights for a given layer if layer_name is provided).
to_hdf5(path)LearningShapelet is not HDF5 serializable
to_json(path)Save model to a JSON file.
to_pickle(path)Save model to a pickle file.
transform(X)Generate shapelet transform for a set of time series.
- fit(X, y)[source]#
Learn time-series shapelets.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset.
- yarray-like of shape=(n_ts, )
Time series labels.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- classmethod from_hdf5(path)[source]#
Load model from a HDF5 file. Requires
h5pyhttp://docs.h5py.org/- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- classmethod from_json(path)[source]#
Load model from a JSON file.
- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- classmethod from_pickle(path)[source]#
Load model from a pickle file.
- Parameters:
- pathstr
Full path to file.
- Returns:
- Model instance
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- get_weights(layer_name=None)[source]#
Return model weights (or weights for a given layer if layer_name is provided).
- Parameters:
- layer_name: str or None (default: None)
Name of the layer for which weights should be returned. If None, all model weights are returned. Available layer names with weights are:
“shapelets_i_j” with i an integer for the shapelet id and j an integer for the dimension
“classification” for the final classification layer
- Returns:
- list
list of model (or layer) weights
Examples
>>> from tslearn.generators import random_walk_blobs >>> X, y = random_walk_blobs(n_ts_per_blob=100, sz=256, d=1, n_blobs=3) >>> clf = LearningShapelets(n_shapelets_per_size={10: 5}, max_iter=1, ... verbose=0) >>> clf.fit(X, y).get_weights("classification")[0].shape (5, 3) >>> clf.get_weights("shapelets_0")[0].shape (5, 10, 1) >>> len(clf.get_weights("shapelets_0")) 1
- locate(X)[source]#
Compute shapelet match location for a set of time series.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset.
- Returns:
- array of shape=(n_ts, n_shapelets)
Location of the shapelet matches for the provided time series.
Examples
>>> from tslearn.generators import random_walk_blobs >>> X = numpy.zeros((3, 10, 1)) >>> X[0, 4:7, 0] = numpy.array([1, 2, 3]) >>> y = [1, 0, 0] >>> # Data is all zeros except a motif 1-2-3 in the first time series >>> clf = LearningShapelets(n_shapelets_per_size={3: 1}, max_iter=1, ... verbose=0) >>> _ = clf.fit(X, y) >>> weights_shapelet = [ ... numpy.array([[[1], [2], [3]]]) ... ] >>> clf.set_weights(weights_shapelet, layer_name="shapelets_0") >>> clf.locate(X) array([[4], [0], [0]])
- predict(X)[source]#
Predict class for a given set of time series.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset.
- Returns:
- array of shape=(n_ts, ) or (n_ts, n_classes), depending on the shape
- of the label vector provided at training time.
Index of the cluster each sample belongs to or class probability matrix, depending on what was provided at training time.
- predict_proba(X)[source]#
Predict class probability for a given set of time series.
- Parameters:
- Xarray-like of shape=(n_ts, sz, d)
Time series dataset.
- Returns:
- array of shape=(n_ts, n_classes),
Class probability matrix.
- score(X, y, sample_weight=None)#
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LearningShapelets#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- set_weights(weights, layer_name=None)[source]#
Set model weights (or weights for a given layer if layer_name is provided).
- Parameters:
- weights: list of ndarrays
Weights to set for the model / target layer
- layer_name: str or None (default: None)
Name of the layer for which weights should be set. If None, all model weights are set. Available layer names with weights are:
“shapelets_i_j” with i an integer for the shapelet id and j an integer for the dimension
“classification” for the final classification layer
Examples
>>> from tslearn.generators import random_walk_blobs >>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=16, d=1, n_blobs=3) >>> clf = LearningShapelets(n_shapelets_per_size={3: 1}, max_iter=1, ... verbose=0) >>> _ = clf.fit(X, y) >>> weights_shapelet = [ ... numpy.array([[[1], [2], [3]]]) ... ] >>> clf.set_weights(weights_shapelet, layer_name="shapelets_0") >>> clf.shapelets_as_time_series_ array([[[1.], [2.], [3.]]])
- property shapelets_as_time_series_[source]#
Set of time-series shapelets formatted as a
tslearntime series dataset.Examples
>>> from tslearn.generators import random_walk_blobs >>> X, y = random_walk_blobs(n_ts_per_blob=10, sz=256, d=1, n_blobs=3) >>> model = LearningShapelets(n_shapelets_per_size={3: 2, 4: 1}, ... max_iter=1) >>> _ = model.fit(X, y) >>> model.shapelets_as_time_series_.shape (3, 4, 1)
Examples using tslearn.shapelets.LearningShapelets#
Learning Shapelets: decision boundaries in 2D distance space