TS_TopFeatures
is run to find features that accurately distinguish groups of time series, yielding a list of features like the following:TS_TopFeatures
are helpful in showing us how these different types of features might cluster into groups that measure similar properties (as shown in the previous section). This helps us to be able to inspect sets of similar, inter-correlated features together as a group, but even when we have isolated such a group, how can we start to interpret and understand what these features are actually measuring? Some features in the list may be easy to interpret directly (e.g., rms
in the list above is simply the root-mean-square of the distribution of time-series values), and others have clues in the name (e.g., features starting with WL_coeffs
are to do with measuring wavelet coefficients, features starting with EN_mse
correspond to measuring the multiscale entropy, mse, and features starting with FC_LocalSimple_mean
are related to time-series forecasting using local means of the time series). Below we outline a procedure for how a user can go from a time-series feature selected by hctsa towards a deeper understanding of the type of algorithm that feature is derived from, how that algorithm performs across the dataset, and thus how it can provide interpretable information about your specific time-series dataset.FC_
stands for forecasting, the function FC_LocalSimple
is the one that produces this feature, which, as the name suggests, performs simple local time-series prediction). We can use the feature ID (3016
) provided in square brackets to get information from the Operations
metadata table:CodeString
field (FC_LocalSimple_mean3
) tells us the name that hctsa uses to describe the Matlab function and its unique set of inputs that produces this feature. Whereas the text following the dot, '.', in the CodeString
field (taures
), tells us the field of the output structure produced by the Matlab function that was run. We can use the MasterID
to get more information about the code that was run using the MasterOperations
metadata table:FC_LocalSimple(y,'mean',3)
. We can get information about this function in the commandline by running a help
command:FC_LocalSimple
directly for more information. Like all code files for computing time-series features, FC_LocalSimple.m
is located in the Operations directory of the hctsa repository. Inspecting the code file, we see that running FC_LocalSimple(y,'mean',3)
does forecasting using local estimates of the time-series mean (since the second input to FC_LocalSimple
, forecastMeth
is set to 'mean'
), using the previous three time-series values to make the prediction (since the third input to FC_LocalSimple
, trainLength
is set to 3
).TS_TopFeatures
analysis, we need to look for the output labeled taures
of the output structure produced by FC_LocalSimple
. We discover the following relevant lines of code in FC_LocalSimple.m
:FC_LocalSimple
then outputs some features on whether there is any residual autocorrelation structure in the residuals of the rolling predictions (the outputs labeled ac1
, ac2
, and our output of interest: taures
). The code shows that this taures
output computes the CO_FirstZero
of the residuals, which measures the first zero of the autocorrelation function (e.g., cf help CO_FirstZero
). When the local mean prediction still leaves a lot of autocorrelation structure in the residuals, our feature, FC_LocalSimple_mean3_taures
, will thus take a high value.TS_FeatureSummary
to visualize how a given feature orders time series, including across labeled groups, can be very useful for feature interpretation.TS_TopFeatures
(cf. Finding informative features), or by searching for features with similar behavior on the dataset to a given feature of interest (cf. Finding nearest neighbors). In a specific domain context, the analyst typically needs to decide on the trade-off between more complicated features that may have slightly higher in-sample performance on a given task, and simpler, more interpretable features that may help guide domain understanding. The procedures outlined above are typically the first step to understanding a time-series analysis algorithm, and its relationship to alternatives that have been developed across science.