Matrix Profile¶
The Matrix Profile, \(MP\), is a new time series that can be calculated based on an input time series \(T\) and a subsequence length \(m\). \(MP_i\) corresponds to the minimal distance from the query subsequence \(T_{i\rightarrow i+m}\) to any subsequence in \(T\) [1]. As the distance from the query subsequence to itself will be equal to zero, \(T_{i-\frac{m}{4}\rightarrow i+\frac{m}{4}}\) is considered as an exclusion zone. In order to construct the Matrix Profile, a distance profile which is similar to the distance calculation used to transform time series into their shapelet-transform space, is calculated for each subsequence, as illustrated below:
For each segment, the distances to all subsequences of the time series are calculated and the minimal distance that does not correspond to the original location of the segment (where the distance is zero) is returned.¶
Implementation¶
The Matrix Profile implementation provided in tslearn
uses numpy or wraps around STUMPY [2]. Three different versions are available:
numpy
: a slow implementationstump
: a fast CPU version, which requires STUMPY to be installedgpu_stump
: the fastest version, which requires STUMPY to be installed and a GPU
Possible Applications¶
The Matrix Profile allows for many possible applications, which are well documented on the page created by the original authors [3]. Some of these applications include: motif and shapelet extraction, discord detection, earthquake detection, and many more.