silhouette_score#

tslearn.clustering.silhouette_score(X, labels, metric=None, sample_size=None, metric_params=None, n_jobs=None, verbose=0, random_state=None, **kwds)[source]#

Compute the mean Silhouette Coefficient of all samples (cf. [1] and [2]).

Read more in the scikit-learn documentation.

Parameters:

Xarray [n_ts, n_ts] if metric == “precomputed”, or, [n_ts, sz, d] otherwise: Array of pairwise distances between time series, or a time series dataset.
labelsarray, shape = [n_ts]: Predicted labels for each time series.
metricstring, callable or None (default: None): The metric to use when calculating distance between time series. Should be one of {‘dtw’, ‘softdtw’, ‘euclidean’} or a callable distance function or None. If ‘softdtw’ is passed, a normalized version of Soft-DTW is used that is defined as sdtw_(x,y) := sdtw(x,y) - 1/2(sdtw(x,x)+sdtw(y,y)). If X is the distance array itself, use metric="precomputed". If None, dtw is used.
sample_sizeint or None (default: None): The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If sample_size is None, no sampling is used.
metric_paramsdict or None (default: None): Parameter values for the chosen metric. For metrics that accept parallelization of the cross-distance matrix computations, n_jobs key passed in metric_params is overridden by the n_jobs argument.
n_jobsint or None, optional (default=None): The number of jobs to run in parallel for cross-distance matrix computations. Ignored if the cross-distance matrix cannot be computed using parallelization. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learns’ Glossary for more details.
verboseint (default: 0): If nonzero, print information about the inertia while learning the model and joblib progress messages are printed.
random_stateint, RandomState instance or None, optional (default: None): The generator used to randomly select a subset of samples. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when sample_size is not None.
**kwdsoptional keyword parameters: Any further parameters are passed directly to the distance function, just as for the metric_params parameter.

Returns:

silhouettefloat: Mean Silhouette Coefficient for all samples.

References

[1]

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.

[2]

Wikipedia entry on the Silhouette Coefficient

Examples

>>> from tslearn.generators import random_walks
>>> from tslearn.metrics import cdist_dtw
>>> from tslearn.metrics import dtw
>>> numpy.random.seed(0)
>>> X = random_walks(n_ts=20, sz=16, d=1)
>>> labels = numpy.random.randint(2, size=20)
>>> float(silhouette_score(X, labels, metric="dtw"))
0.13383800...
>>> float(silhouette_score(X, labels, metric="euclidean"))
0.09126917...
>>> float(silhouette_score(X, labels, metric="softdtw"))
0.17953934...
>>> float(silhouette_score(X, labels, metric="softdtw",
...                        metric_params={"gamma": 2.}))
0.17591060...
>>> float(silhouette_score(cdist_dtw(X), labels,
...                        metric="precomputed"))
0.13383800...
>>> float(silhouette_score(X, labels, metric=dtw))
0.13383800...