Last updated 10 months ago

While the global structure of a time-series dataset can be investigated by plotting the data matrix (`TS_plot_DataMatrix`

) or a low-dimensional representation of it (`TS_PlotLowDim`

), sometimes it can be more interesting to retrieve and visualize relationships between a set of nearest neighbors to a particular time series of interest.

The *hctsa* framework provides a way to easily compute distances between pairs of time series, e.g., as a Euclidean distance between their normalized feature vectors. This allows very different time series (in terms of their origin, their method of recording and measurement, and their number of samples) to be compared straightforwardly according to their properties, measured by the algorithms in our *hctsa* library.

For this, we use the `TS_SimSearch`

function, specifying the id of the time series of interest (i.e., the `ID`

field of the `TimeSeries`

structure) with the first input and the number of neighbors with the 'numNeighbors' input specifier (default: 20). By default, data is loaded from `HCTSA_N.loc`

, but a custom source can be specified using the `'whatDataFile'`

input specifier (e.g., `TS_SimSearch('whatDataFile','HCTSA_custom.mat')`

).

After specifying the target and how many neighbors to retrieve, `TS_SimSearch`

outputs the list of neighbors and their distances to screen, and the function also provides a range of plotting options to visualize the neighbors. The plots to produce are specified as a cell using the 'whatPlots' input.

TS_SimSearch('whatPlots',{'matrix'});

To investigate the pairwise relationships between all neighbors retrieved, you specify the `'matrix'`

option of the `TS_SimSearch`

function. An example output using a publicly-available EEG dataset, retrieving 14 neighbors from the time series with `ID = 1`

, as `TS_SimSearch(1,'whatPlots',{'matrix'},'numNeighbors',14)`

, is shown below:

The specified target time series (`ID = 1`

) is shown as a white star, and all 14 neighbors are shown, as labeled on the left of the plot with their respective IDs, and a 100-sample subset of their time traces.

Pairwise distances are computed between all pairs of time series (as a Euclidean distance between their feature vectors), and plotted using color, from low (red = more similar pairs of time series) to high (blue = more different pairs of time series).

Because this dataset contains 3 classes that were previously labeled (using `TS_LabelGroups`

as: `TS_LabelGroups({'seizure','eyesOpen','eyesClosed'})`

), the function shows these class assignments using color labels to the left of the plot (purple, green, and orange in this case).

In this case we see that the purple and green classes are relatively similar under this distance metric (eyes open and eyes closed), whereas the orange time series (seizure) are distinguished.

Another way to visualize the similarity (under our feature-based distance metric) of all pairs of neighbors is using a network visualization. This is specified as:

TS_SimSearch(1,'whatPlots',{'network'});

which produces something like the following:

The strongest links are visualized as blue lines (by default, the top 40% of strongest links are plotted, cf. the legend showing 0.9, 0.8, 0.7, and 0.6 for the top 10%, 20%, 30%, and 40% of links, respectively).

The target is distinguished (as purple in this case), and the other classes of time series are shown using color, with names and time-series segments annotated. Again, you can see that the EEG time series during seizure (blue) are distinguished from eyes open (red) and eyes closed (green).

TS_SimSearch(1,'whatPlots',{'scatter'});

The scatter setting visualizes the relationship between the target and each of 12 time series with the most similar properties to the target. Each subplot is a scatter of the (normalized) outputs of each feature for the specified target (x-axis) and the match (y-axis). An example is shown below.

Multiple output plots can be produced simultaneously by specifying many types of plots as follows:

TS_SimSearch(1,'whatPlots',{'matrix','network','scatter'})

This produces a plot of each type.

Note that pairwise distances can be pre-computed and saved in the `HCTSA*.mat`

file using `TS_PairwiseDist`

for custom distance metrics (which is done by default in `TS_cluster`

for datasets containing fewer than 1000 objects). `TS_SimSearch`

checks for this information in the specified input data (containing the `ts_clust`

or `op_clust`

structure), and uses it to retrieve neighbors. If distances have not previously been computed, distances from the target are computed as euclidean distances (time series) or absolute correlation distances (operations) between feature vectors within `TS_SimSearch`

.