tslearn.early_classification.NonMyopicEarlyClassifier

class tslearn.early_classification.NonMyopicEarlyClassifier(n_clusters=2, base_classifier=None, min_t=1, lamb=1.0, cost_time_parameter=1.0, random_state=None)[source]

Early Classification modelling for time series using the model presented in [1].

Parameters:
n_clusters : int

Number of clusters to form.

base_classifier : Estimator or None

Estimator (instance) to be cloned and used for classifications. If None, the chosen classifier is a 1NN with Euclidean metric.

min_t : int

Earliest time at which a classification can be performed on a time series

lamb : float

Value of the hyper parameter lambda used during the computation of the cost function to evaluate the probability that a time series belongs to a cluster given the time series.

cost_time_parameter : float

Parameter of the cost function of time. This function is of the form : f(time) = time * cost_time_parameter

random_state: int

Random state of the base estimator

Attributes:
classifiers_ : list

A list containing all the classifiers trained for the model, that is, (maximum_time_stamp - min_t) elements.

pyhatyck_ : array like of shape (maximum_time_stamp - min_t, n_cluster, __n_classes, __n_classes)

Contains the probabilities of being classified as class y_hat given class y and cluster ck for a trained classifier. The penultimate dimension of the array is associated to the true class of the series and the last dimension to the predicted class.

pyck_ : array like of shape (__n_classes, n_cluster)

Contains the probabilities of being of true class y given a cluster ck

X_fit_dims : tuple of the same shape as the training dataset

References

[1]A. Dachraoui, A. Bondu & A. Cornuejols. Early classification of time series as a non myopic sequential decision making problem. ECML/PKDD 2015

Examples

>>> dataset = to_time_series_dataset([[1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [3, 2, 1, 1, 2, 3],
...                                   [3, 2, 1, 1, 2, 3]])
>>> y = [0, 0, 0, 1, 1, 1, 0, 0]
>>> model = NonMyopicEarlyClassifier(n_clusters=3, lamb=1000.,
...                                  cost_time_parameter=.1,
...                                  random_state=0)
>>> model.fit(dataset, y)  # doctest: +ELLIPSIS
NonMyopicEarlyClassifier(...)
>>> print(type(model.classifiers_))
<class 'dict'>
>>> print(model.pyck_)
[[0. 1. 1.]
 [1. 0. 0.]]
>>> preds, pred_times = model.predict_class_and_earliness(dataset)
>>> preds
array([0, 0, 0, 1, 1, 1, 0, 0])
>>> pred_times
array([4, 4, 4, 4, 4, 4, 1, 1])
>>> pred_probas, pred_times = model.predict_proba_and_earliness(dataset)
>>> pred_probas
array([[1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.]])
>>> pred_times
array([4, 4, 4, 4, 4, 4, 1, 1])

Methods

early_classification_cost(X, y) Compute early classification score.
fit(X, y) Fit early classifier.
get_cluster_probas(Xi) Compute cluster probability \(P(c_k | Xi)\).
get_params([deep]) Get parameters for this estimator.
predict(X) Provide predicted class.
predict_class_and_earliness(X) Provide predicted class as well as prediction timestamps.
predict_proba(X) Probability estimates.
predict_proba_and_earliness(X) Provide probability estimates as well as prediction timestamps.
score(X, y[, sample_weight]) Return the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.
early_classification_cost(X, y)[source]

Compute early classification score.

The score is computed as:

\[1 - acc + \alpha \frac{1}{n} \sum_i t_i\]

where \(\alpha\) is the trade-off parameter (self.cost_time_parameter) and \(t_i\) are prediction timestamps.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Vector to be scored, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

Returns:
float

Early classification cost (a positive number, the lower the better)

Examples

>>> dataset = to_time_series_dataset([[1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [3, 2, 1, 1, 2, 3],
...                                   [3, 2, 1, 1, 2, 3]])
>>> y = [0, 0, 0, 1, 1, 1, 0, 0]
>>> model = NonMyopicEarlyClassifier(n_clusters=3, lamb=1000.,
...                                  cost_time_parameter=.1,
...                                  random_state=0)
>>> model.fit(dataset, y)  # doctest: +ELLIPSIS
NonMyopicEarlyClassifier(...)
>>> preds, pred_times = model.predict_class_and_earliness(dataset)
>>> preds
array([0, 0, 0, 1, 1, 1, 0, 0])
>>> pred_times
array([4, 4, 4, 4, 4, 4, 1, 1])
>>> model.early_classification_cost(dataset, y)
0.325
fit(X, y)[source]

Fit early classifier.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Training data, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

y : array-like of shape (n_samples,)

Target values. Will be cast to X’s dtype if necessary

Returns:
self : returns an instance of self.
get_cluster_probas(Xi)[source]

Compute cluster probability \(P(c_k | Xi)\).

This quantity is computed using the following formula:

\[P(c_k | Xi) = \frac{s_k(Xi)}{\sum_j s_j(Xi)}\]

where

\[s_k(Xi) = \frac{1}{1 + \exp{-\lambda \Delta_k(Xi)}}\]

with

\[\Delta_k(Xi) = \frac{\bar{D} - d(Xi, c_k)}{\bar{D}}\]

and \(\bar{D}\) is the average of the distances between Xi and the cluster centers.

Parameters:
Xi: numpy array, shape (t, d)

A time series observed up to time t

Returns:
probas : numpy array, shape (n_clusters, )

Examples

>>> from tslearn.utils import to_time_series
>>> dataset = to_time_series_dataset([[1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 4, 5, 6],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [1, 2, 3, 3, 2, 1],
...                                   [3, 2, 1, 1, 2, 3],
...                                   [3, 2, 1, 1, 2, 3]])
>>> y = [0, 0, 0, 1, 1, 1, 0, 0]
>>> ts0 = to_time_series([1, 2])
>>> model = NonMyopicEarlyClassifier(n_clusters=3, lamb=0.,
...                                  random_state=0)
>>> probas = model.fit(dataset, y).get_cluster_probas(ts0)
>>> probas.shape
(3,)
>>> probas  # doctest: +ELLIPSIS
array([0.33..., 0.33..., 0.33...])
>>> model = NonMyopicEarlyClassifier(n_clusters=3, lamb=10000.,
...                                  random_state=0)
>>> probas = model.fit(dataset, y).get_cluster_probas(ts0)
>>> probas.shape
(3,)
>>> probas
array([0.5, 0.5, 0. ])
>>> ts1 = to_time_series([3, 2])
>>> model.get_cluster_probas(ts1)
array([0., 0., 1.])
get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : dict

Parameter names mapped to their values.

predict(X)[source]

Provide predicted class.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Vector to be scored, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

Returns:
array, shape (n_samples,)

Predicted classes.

predict_class_and_earliness(X)[source]

Provide predicted class as well as prediction timestamps.

Prediction timestamps are timestamps at which a prediction is made in early classification setting.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Vector to be scored, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

Returns:
array, shape (n_samples,)

Predicted classes.

array-like of shape (n_series, )

Prediction timestamps.

predict_proba(X)[source]

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Vector to be scored, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

Returns:
array-like of shape (n_series, n_classes)

Probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

predict_proba_and_earliness(X)[source]

Provide probability estimates as well as prediction timestamps.

Prediction timestamps are timestamps at which a prediction is made in early classification setting. The returned estimates for all classes are ordered by the label of classes.

Parameters:
X : array-like of shape (n_series, n_timestamps, n_features)

Vector to be scored, where n_series is the number of time series, n_timestamps is the number of timestamps in the series and n_features is the number of features recorded at each timestamp.

Returns:
array-like of shape (n_series, n_classes)

Probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

array-like of shape (n_series, )

Prediction timestamps.

score(X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : estimator instance

Estimator instance.

Examples using tslearn.early_classification.NonMyopicEarlyClassifier