Learning Optimal Time Series Combination and Pre-Processing by Smart Joins

In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 10; no. 18; p. 6346
Main Authors Gil, Amaia, Quartulli, Marco, Olaizola, Igor G., Sierra, Basilio
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.09.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2076-3417
2076-3417
DOI:10.3390/app10186346