The BOSS is concerned with time series classification in the presence of noise

Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive s...

Full description

Saved in:
Bibliographic Details
Published inData mining and knowledge discovery Vol. 29; no. 6; pp. 1505 - 1530
Main Author Schaefer, Patrick
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive substructures. These build a foundation for state of the art search algorithms. However, noise has been paid surprisingly little attention to and is assumed to be filtered as part of a preprocessing step carried out by a human. Our Bag-of-SFA-Symbols (BOSS) model combines the extraction of substructures with the tolerance to extraneous and erroneous data using a noise reducing representation of the time series. We show that our BOSS ensemble classifier improves the best published classification accuracies in diverse application areas and on the official UCR classification benchmark datasets by a large margin.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1384-5810
1573-756X
DOI:10.1007/s10618-014-0377-7