2 Multivariate time series
A multivariate time series with n attributes, is de ned as T = [ T1, T2 , ... , Tn
], where Ti [t] is the value of the i th attribute at a timestamp t. If l is the length
of the timeseries, then it is represented by a l × n matrix.
A multivariate shapelet is de ned as f = (s, l, ∆, cf ). The vector s =[ s1, s2, ...
, sn ] where si[t] is the value of the i th attribute of the shapelet at a timestamp t, l
the length of the shapelet, ∆ the n-dimensional distance threshold, and cf the class
of the shapelet.
The distance between a multivariate shapelet f and a multivariate time series T
is a vector de ned as : dist(s, T) = [ dist(s1, T1), dist(s2, T2), ... , dist(sn, Tn)]
dist( si, Ti) is the minimum distance between si and all subsequences of Ti of same
length as si, i.e the minimal distance between s and T for each dimension.
The distance threshold ∆ = [δ1, δ2, ..., δn] is computed using by algorithm 1.1
additional le : ecmts algorithms.
The distance threshold divides the dataset into two groups. A group DL containing only time series of same class as the shapelet and a group DR of time series
with a di erent class.
The entropy of a dataset is computed as : − c∈C m
M log( M ) , where mc is the
number of time series of class c and M is the number of time series in the dataset.
The information gain of the shapelet f is computed as :
IG = Entropy − M
M EL − M ER where Entropy is the entropy of the current dataset and, EL and ER are the entropy of DL and DR.
3 Multivariate shapelet extraction
The learning algorithm is described in section 1 additional le : ecmts algorithms.
The extraction algorithm iterates over each time series and extracts all multivariate
shapelets of length in the speci ed range. For each multivariate shapelet, it computes the distance with every time series. We know that each distance is a vector
of length N so the distances between a multivariate shapelet and all time series in a
dataset of length M, is a matrix with N × M dimensions.
Afterward the method computes the distance threshold and utility score for each
multivariate shapelet, then selects the shapelets with the highest information gain,
that cover the time series in the learning dataset.
To compute the distance threshold of a shapelet, we need to provide a way to
compare two multi-dimensional distances. Therefore, two multidimensional distances