Integration with other Python packages¶
tslearn
is a general-purpose Python machine learning library for time
series that offers tools for pre-processing and feature extraction as well as
dedicated models for clustering, classification and regression.
To ensure compatibility with more specific Python packages, we provide utilities
to convert data sets from and to other formats.
tslearn.utils.to_time_series_dataset()
is a general function that
transforms an array-like object into a three-dimensional array of shape
(n_ts, sz, d)
with the following conventions:
the fist axis is the sample axis,
n_ts
being the number of time series;the second axis is the time axis,
sz
being the maximum number of time points;the third axis is the dimension axis,
d
being the number of dimensions.
This is how a data set of time series is represented in tslearn
.
The following sections briefly explain how to transform a data set from
tslearn
to another supported Python package and vice versa.
scikit-learn¶
scikit-learn is a popular Python package for
machine learning.
tslearn.utils.to_sklearn_dataset()
converts a data set from tslearn
format to scikit-learn
format. To convert a data set from
scikit-learn
, you can use tslearn.utils.to_time_series_dataset()
.
>>> from tslearn.utils import to_sklearn_dataset
>>> to_sklearn_dataset([[1, 2], [1, 4, 3]])
array([[ 1., 2., nan],
[ 1., 4., 3.]])
>>> to_time_series_dataset([[ 1., 2., None], [ 1., 4., 3.]])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
pyts¶
pyts is a Python package dedicated to time
series classification.
tslearn.utils.to_pyts_dataset()
and tslearn.utils.from_pyts_dataset()
allow users to convert a data set from tslearn
format to pyts
format
and vice versa.
>>> from tslearn.utils import from_pyts_dataset, to_pyts_dataset
>>> from_pyts_dataset([[1, 2], [1, 4]])
array([[[1],
[2]],
[[1],
[4]]])
>>> to_pyts_dataset([[[1], [2]], [[1], [4]]])
array([[1., 2.],
[1., 4.]])
seglearn¶
seglearn is a python package for machine
learning time series or sequences.
tslearn.utils.to_seglearn_dataset()
and tslearn.utils.from_seglearn_dataset()
allow users to convert a data set from tslearn
format to seglearn
format
and vice versa.
>>> from tslearn.utils import from_seglearn_dataset, to_seglearn_dataset
>>> from_seglearn_dataset([[1, 2], [1, 4, 3]])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_seglearn_dataset([[[1], [2], [None]], [[1], [4], [3]]])
array([array([[1.],
[2.]]),
array([[1.],
[4.],
[3.]])], dtype=object)
stumpy¶
stumpy is a powerful and scalable Python
library for computing a Matrix Profile, which can be used for a variety of time
series data mining tasks.
tslearn.utils.to_stumpy_dataset()
and tslearn.utils.from_stumpy_dataset()
allow users to convert a data set from tslearn
format to stumpy
format
and vice versa.
>>> import numpy as np
>>> from tslearn.utils import from_stumpy_dataset, to_stumpy_dataset
>>> from_stumpy_dataset([np.array([1, 2]), np.array([1, 4, 3])])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_stumpy_dataset([[[1], [2], [None]], [[1], [4], [3]]])
[array([1., 2.]), array([1., 4., 3.])]
sktime¶
sktime is a scikit-learn
compatible Python toolbox for learning with time series.
tslearn.utils.to_sktime_dataset()
and tslearn.utils.from_sktime_dataset()
allow users to convert a data set from tslearn
format to sktime
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_sktime_dataset, to_sktime_dataset
>>> df = pd.DataFrame()
>>> df["dim_0"] = [pd.Series([1, 2]), pd.Series([1, 4, 3])]
>>> from_sktime_dataset(df)
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_sktime_dataset([[[1], [2], [None]], [[1], [4], [3]]]).shape
(2, 1)
pyflux¶
pyflux is a library for time series analysis
and prediction.
tslearn.utils.to_pyflux_dataset()
and tslearn.utils.from_pyflux_dataset()
allow users to convert a data set from tslearn
format to pyflux
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_pyflux_dataset, to_pyflux_dataset
>>> df = pd.DataFrame([1, 2], columns=["dim_0"])
>>> from_pyflux_dataset(df)
array([[[1.],
[2.]]])
>>> to_pyflux_dataset([[[1], [2]]]).shape
(2, 1)
tsfresh¶
tsfresh is a python package automatically
calculating a large number of time series characteristics.
tslearn.utils.to_tsfresh_dataset()
and tslearn.utils.from_tsfresh_dataset()
allow users to convert a data set from tslearn
format to tsfresh
format
and vice versa.
pandas
is a required dependency to use these functions.
>>> import pandas as pd
>>> from tslearn.utils import from_tsfresh_dataset, to_tsfresh_dataset
>>> df = pd.DataFrame([[0, 0, 1.0],
... [0, 1, 2.0],
... [1, 0, 1.0],
... [1, 1, 4.0],
... [1, 2, 3.0]], columns=['id', 'time', 'dim_0'])
>>> from_tsfresh_dataset(df)
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> to_tsfresh_dataset([[[1], [2], [None]], [[1], [4], [3]]]).shape
(5, 3)
cesium¶
cesium is an open-source platform for time series inference.
tslearn.utils.to_cesium_dataset()
and tslearn.utils.from_cesium_dataset()
allow users to convert a data set from tslearn
format to cesium
format
and vice versa.
cesium
is a required dependency to use these functions.
>>> from tslearn.utils import from_cesium_dataset, to_cesium_dataset
>>> from cesium.data_management import TimeSeries
>>> from_cesium_dataset([TimeSeries(m=[1, 2]), TimeSeries(m=[1, 4, 3])])
array([[[ 1.],
[ 2.],
[nan]],
[[ 1.],
[ 4.],
[ 3.]]])
>>> len(to_cesium_dataset([[[1], [2], [None]], [[1], [4], [3]]]))
2