tslearn.clustering.silhouette_score

tslearn.clustering.silhouette_score(X, labels, metric=None, sample_size=None, metric_params=None, n_jobs=None, verbose=0, random_state=None, **kwds)[source]

Compute the mean Silhouette Coefficient of all samples (cf. [1] and [2]).

Read more in the scikit-learn documentation.

Parameters:
X : array [n_ts, n_ts] if metric == “precomputed”, or, [n_ts, sz, d] otherwise

Array of pairwise distances between time series, or a time series dataset.

labels : array, shape = [n_ts]

Predicted labels for each time series.

metric : string, callable or None (default: None)

The metric to use when calculating distance between time series. Should be one of {‘dtw’, ‘softdtw’, ‘euclidean’} or a callable distance function or None. If ‘softdtw’ is passed, a normalized version of Soft-DTW is used that is defined as sdtw_(x,y) := sdtw(x,y) - 1/2(sdtw(x,x)+sdtw(y,y)). If X is the distance array itself, use metric="precomputed". If None, dtw is used.

sample_size : int or None (default: None)

The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If sample_size is None, no sampling is used.

metric_params : dict or None (default: None)

Parameter values for the chosen metric. For metrics that accept parallelization of the cross-distance matrix computations, n_jobs key passed in metric_params is overridden by the n_jobs argument.

n_jobs : int or None, optional (default=None)

The number of jobs to run in parallel for cross-distance matrix computations. Ignored if the cross-distance matrix cannot be computed using parallelization. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learns’ Glossary for more details.

verbose : int (default: 0)

If nonzero, print information about the inertia while learning the model and joblib progress messages are printed.

random_state : int, RandomState instance or None, optional (default: None)

The generator used to randomly select a subset of samples. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when sample_size is not None.

**kwds : optional keyword parameters

Any further parameters are passed directly to the distance function, just as for the metric_params parameter.

Returns:
silhouette : float

Mean Silhouette Coefficient for all samples.

References

[1]Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.
[2]Wikipedia entry on the Silhouette Coefficient

Examples

>>> from tslearn.generators import random_walks
>>> from tslearn.metrics import cdist_dtw
>>> numpy.random.seed(0)
>>> X = random_walks(n_ts=20, sz=16, d=1)
>>> labels = numpy.random.randint(2, size=20)
>>> silhouette_score(X, labels, metric="dtw")  # doctest: +ELLIPSIS
0.13383800...
>>> silhouette_score(X, labels, metric="euclidean")  # doctest: +ELLIPSIS
0.09126917...
>>> silhouette_score(X, labels, metric="softdtw")  # doctest: +ELLIPSIS
0.17953934...
>>> silhouette_score(X, labels, metric="softdtw",
...                  metric_params={"gamma": 2.})     # doctest: +ELLIPSIS
0.17591060...
>>> silhouette_score(cdist_dtw(X), labels,
...                  metric="precomputed")  # doctest: +ELLIPSIS
0.13383800...