Learning Optimal Time Series Combination and Pre-Processing by Smart Joins

In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 10; no. 18; p. 6346
Main Authors Gil, Amaia, Quartulli, Marco, Olaizola, Igor G., Sierra, Basilio
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.09.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process.
AbstractList In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process.
Author Quartulli, Marco
Olaizola, Igor G.
Sierra, Basilio
Gil, Amaia
Author_xml – sequence: 1
  givenname: Amaia
  surname: Gil
  fullname: Gil, Amaia
– sequence: 2
  givenname: Marco
  orcidid: 0000-0001-5735-2072
  surname: Quartulli
  fullname: Quartulli, Marco
– sequence: 3
  givenname: Igor G.
  orcidid: 0000-0002-9965-2038
  surname: Olaizola
  fullname: Olaizola, Igor G.
– sequence: 4
  givenname: Basilio
  orcidid: 0000-0001-8062-9332
  surname: Sierra
  fullname: Sierra, Basilio
BookMark eNptUU1rAjEUDMVCrfXUP7DQY9k2L8km7rFIPxRBQXsO2SQrEU22yXrw33fVFqT0Xd7jMTMMM7eo54O3CN0DfqK0xM-qaQDDiFPGr1CfYMFzykD0Lu4bNExpg7spgY4A99F0ZlX0zq-zedO6ndpmK7ez2dJGZ1M2DrvKedW64DPlTbaINl_EoG1KR0p1yJY7FdtsGpxPd-i6Vttkhz97gD7fXlfjj3w2f5-MX2a5ppy1ecXqghRGG6UBgKjaAIHS6gI4ptxQoYqKlVrQghGDSTliQjGCSWUY4bQu6QBNzromqI1sYuc6HmRQTp4eIa5l58nprZWmNpWoQAtOLDOWl6BNyUZYVKoWTPBO6-Gs1cTwtbeplZuwj76zLwljlBzzhA4FZ5SOIaVoa6lde0qljcptJWB5bEBeNNBxHv9wfp3-h_4Gyo6G9A
CitedBy_id crossref_primary_10_3390_app11094132
crossref_primary_10_1007_s12145_023_00976_y
crossref_primary_10_1155_2022_7400833
Cites_doi 10.1007/s00521-017-3049-x
10.1109/ACCESS.2015.2508940
10.1016/j.patcog.2017.08.015
10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
10.1016/j.patcog.2018.04.003
10.1002/aic.690450106
10.1007/s00170-010-3094-4
10.1021/ie00035a025
10.1016/j.energy.2016.03.051
10.1016/j.neucom.2016.07.050
10.1016/j.ijdrr.2020.101587
10.1016/j.apenergy.2018.11.034
ContentType Journal Article
Copyright 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
DWQXO
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
DOA
DOI 10.3390/app10186346
DatabaseName CrossRef
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest One Community College
ProQuest Central
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DOAJ Open Access Full Text
DatabaseTitle CrossRef
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Publicly Available Content Database
CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
EISSN 2076-3417
ExternalDocumentID oai_doaj_org_article_dfdb7b1c762e4de691cd94807baf7476
10_3390_app10186346
GroupedDBID .4S
2XV
5VS
7XC
8CJ
8FE
8FG
8FH
AADQD
AAFWJ
AAYXX
ADBBV
ADMLS
AFKRA
AFPKN
AFZYC
ALMA_UNASSIGNED_HOLDINGS
APEBS
ARCSS
BCNDV
BENPR
CCPQU
CITATION
CZ9
D1I
D1J
D1K
GROUPED_DOAJ
IAO
IGS
ITC
K6-
K6V
KC.
KQ8
L6V
LK5
LK8
M7R
MODMG
M~E
OK1
P62
PHGZM
PHGZT
PIMPY
PROAC
TUS
ABUWG
AZQEC
DWQXO
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
PUEGO
ID FETCH-LOGICAL-c364t-b4f525dcdac1112afd1219ec516036d37a5b49c73542d029847a4202bd4263f93
IEDL.DBID BENPR
ISSN 2076-3417
IngestDate Wed Aug 27 01:30:57 EDT 2025
Mon Jun 30 07:26:28 EDT 2025
Thu Apr 24 23:00:19 EDT 2025
Tue Jul 01 03:14:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 18
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c364t-b4f525dcdac1112afd1219ec516036d37a5b49c73542d029847a4202bd4263f93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-9965-2038
0000-0001-8062-9332
0000-0001-5735-2072
OpenAccessLink https://www.proquest.com/docview/2443201861?pq-origsite=%requestingapplication%
PQID 2443201861
PQPubID 2032433
ParticipantIDs doaj_primary_oai_doaj_org_article_dfdb7b1c762e4de691cd94807baf7476
proquest_journals_2443201861
crossref_citationtrail_10_3390_app10186346
crossref_primary_10_3390_app10186346
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-09-01
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Applied sciences
PublicationYear 2020
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References Folgado (ref_2) 2018; 81
Ciric (ref_11) 1994; 33
Lorenz (ref_9) 1963; 20
ref_10
ref_1
Kumar (ref_12) 1999; 45
Osuolale (ref_13) 2016; 106
Zahedi (ref_7) 2020; 48
Mirakhorli (ref_15) 2020; 8
Morel (ref_3) 2018; 74
Sun (ref_4) 2011; 55
Yang (ref_8) 2019; 235
Tehlah (ref_14) 2016; 216
Zuo (ref_6) 2015; 3
Tawhid (ref_5) 2019; 31
References_xml – volume: 31
  start-page: 915
  year: 2019
  ident: ref_5
  article-title: Multi-objective sine-cosine algorithm (MO-SCA) for multi-objective engineering design problems
  publication-title: Neural Comput. Appl.
  doi: 10.1007/s00521-017-3049-x
– volume: 3
  start-page: 2687
  year: 2015
  ident: ref_6
  article-title: A Multi-Objective Optimization Scheduling Method Based on the Ant Colony Algorithm in Cloud Computing
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2015.2508940
– volume: 74
  start-page: 77
  year: 2018
  ident: ref_3
  article-title: Time-series averaging using constrained dynamic time warping with tolerance
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2017.08.015
– volume: 20
  start-page: 130
  year: 1963
  ident: ref_9
  article-title: Deterministic Nonperiodic Flow
  publication-title: J. Atmos. Sci.
  doi: 10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
– ident: ref_10
– volume: 81
  start-page: 268
  year: 2018
  ident: ref_2
  article-title: Time alignment measurement for time series
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2018.04.003
– volume: 45
  start-page: 51
  year: 1999
  ident: ref_12
  article-title: Modeling, analysis and control of ethylene glycol reactive distillation column
  publication-title: AIChE J.
  doi: 10.1002/aic.690450106
– volume: 55
  start-page: 723
  year: 2011
  ident: ref_4
  article-title: Multi-objective optimization algorithms for flow shop scheduling problem: A review and prospects
  publication-title: Int. J. Adv. Manuf. Technol.
  doi: 10.1007/s00170-010-3094-4
– volume: 33
  start-page: 2738
  year: 1994
  ident: ref_11
  article-title: Steady state multiplicities in an ethylene glycol reactive distillation column
  publication-title: Ind. Eng. Chem. Res.
  doi: 10.1021/ie00035a025
– volume: 106
  start-page: 562
  year: 2016
  ident: ref_13
  article-title: Energy efficiency optimisation for distillation column using artificial neural network models
  publication-title: Energy
  doi: 10.1016/j.energy.2016.03.051
– ident: ref_1
– volume: 216
  start-page: 489
  year: 2016
  ident: ref_14
  article-title: Artificial neural network based modeling and optimization of refined palm oil process
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2016.07.050
– volume: 48
  start-page: 101587
  year: 2020
  ident: ref_7
  article-title: Multi-objective decision-making model for distribution planning of goods and routing of vehicles in emergency multi-objective decision-making model for distribution planning of goods and routing of vehicles in emergency
  publication-title: Int. J. Disaster Risk Reduct.
  doi: 10.1016/j.ijdrr.2020.101587
– volume: 235
  start-page: 1205
  year: 2019
  ident: ref_8
  article-title: A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting
  publication-title: Appl. Energy
  doi: 10.1016/j.apenergy.2018.11.034
– volume: 8
  start-page: 105
  year: 2020
  ident: ref_15
  article-title: Fault diagnosis in a distillation column using a support vector machine based classifier
  publication-title: Int. J. Smart Electr. Eng.
SSID ssj0000913810
Score 2.17081
Snippet In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the...
SourceID doaj
proquest
crossref
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
StartPage 6346
SubjectTerms Algorithms
Case studies
Data analysis
Machine learning
Methods
Optimization
preprocessing
Time series
SummonAdditionalLinks – databaseName: DOAJ Open Access Full Text
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kJz2IrYrVKnvoQYVgs9nN46hiKQUfoIXewj5FsKm09eC_d2azLQEFL17DJBtmMo9vs_MNIX2ZOsWlY5GSUvqtm0hig0-suAb0bAaZwq2B-4d0NOHjqZg2Rn3hmbCaHrhW3JVxRmUq1uC0lhubFrE2BfZBK-mgFPZk25DzGmDKx-AiRuqquiEvAVyP_4ORnCpNsNRtpCDP1P8jEPvsMtwju6EspNf167TJlq06ZKdBFtgh7eCGS3oeuKIv9sk48KO-0kfw_Rk8Ans6KO55gSA4OwBfr3sqK0OfFjYKnQF4i_qizzPQAB3P36rlAZkM715uR1EYjxDpJOWrSHEnmDDaSA0Bi0lnYgg_VgucHJ2aJJNC8UJnieDMINM6zyRnA6YMkrS7IjkkrWpe2SNCRe5cYVPBjAOAJfNcqNwVAHVYZp3NRZdcrjVW6sAdjiMs3kvAEKjesqHeLulvhD9qyozfxW5Q9RsR5Ln2F8D6ZbB--Zf1u6S3NlwZnG9ZQsWSMFwlPv6PNU7INkOQ7Q-W9Uhrtfi0p1CJrNSZ_-i-Acrz2yk
  priority: 102
  providerName: Directory of Open Access Journals
Title Learning Optimal Time Series Combination and Pre-Processing by Smart Joins
URI https://www.proquest.com/docview/2443201861
https://doaj.org/article/dfdb7b1c762e4de691cd94807baf7476
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LTuswEB1B2cACARfEs_KCBfdKEY1jJ84KAaKgSjwEF4ld5CdCghTasuDvmUndUgnENnG8GHvO-Ew8ZwD2dR6M0IEnRmvdpG4STQU-qREW2bPrFIZSA5dX-cW96D3Ih5hwG8ZrlRNMbIDa9S3lyA8xDGUYrFSeHr2-JdQ1iv6uxhYa87CAEKxUCxZOzq5ubqdZFlK9VGlnXJiXIb-n_8IkUpVndOSdCUWNYv83QG6iTHcFluPxkB2P13MV5ny9BkszooFrsBrdccgOomb03z_Qizqpj-waMeAFp6DaDka5LxyITo8EuFkDpmvHbgY-iRUC9In5YHcvuIVYr_9UD9fhvnv2__QiiW0SEpvlYpQYESSXzjptEbi4Di5FGPJWUgfp3GWFlkaUtsik4I4U10WhBe9w40isPZTZBrTqfu03gUkVQulzyV1AoqWVkkaFEikPL3zwSm7Bv4nFKhs1xKmVxXOFXILMW82Ydwv2p4Nfx9IZPw87IdNPh5DedfOgP3isovtULjhTmNQidHvhfF6m1pVUDW90QEKEk-xOFq6KTjisvrbM9u-vd2CRE41uro7tQms0ePd7eNYYmTbMq-55O26rdsPYPwEi8NTc
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB4heigcqgJF0KatD0GiSCuyXnsfh6rqK4SQhEoFidviJ6pUNmmSquJP9Td2ZuNNI1H1xnXt9WE8D39jzzcAbZV6LZTnkVZK1ambSFGBT6yFQfRsO5mm1MBwlPYuRf9KXq3B76YWhp5VNj6xdtR2bChHfoxhKMFglafxu8mPiLpG0e1q00JjoRZn7u4XQrbZ29NPuL8HnHc_X3zsRaGrQGSSVMwjLbzk0hqrDNo5V97GaLXOSGq4nNokU1KLwmSJFNwSQbnIlOAdri1xm3siX0KX_0gkGMmpMr17sszpEMdmHncWZYA43qFbaKLEShM6YK8Evro_wD33X8e07lN4Eg6j7P1Ce7ZgzVXbsLlCUbgNW8H4Z-wwMFS_2YF-YGW9YefocW5xCaokYZRpw4noYhBu1zvOVGXZl6mLQj0C_aLv2NdbVFjWH3-rZs_g8kHEtwvr1bhye8Bk7n3hUsmtR1in8lzq3BcIsHjmvMvlPhw1EitNYCynxhnfS0QuJN5yRbz70F5OniyIOv497QOJfjmF2LXrD-PpTRmMtbTe6kzHBgOFE9alRWxsQbX3WnmEX7hIq9m4Mpj8rPyroM__P_waHvcuhoNycDo6ewEbnAB8_WitBevz6U_3Ek85c_2qVi0G1w-ty38ARegNBA
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB6hIFVwqAoUEaCtDyC1lVZkvfY-DlVVChEEGiIeErfFT1QJNjRJVfHX-us6s_GmkVpx47o79mE8nvE39nwDsKNSr4XyPNJKqTp1Eykq8Im1MIiebSfTlBr41k-PrkTvWl4vwO-mFoaeVTY-sXbUdmgoR76HYSjBYJWn8Z4PzyIGB93PDz8i6iBFN61NO42piZy4x18I38afjg9wrXc57x5efj2KQoeByCSpmERaeMmlNVYZ3PNceRvjDnZGUvPl1CaZkloUJkuk4JbIykWmBO9wbYnn3BMRE7r_xYxQUQsW9w_7g_NZhocYN_O4My0KTJKiQ3fSRJCVJnTcnguDdbeAf4JBHeG6r-BlOJqyL1NbWoEFV63C8hxh4SqsBFcwZu8DX_WHNegFjtZbdob-5x6noLoSRnk3FESHg-C7Xn-mKssGIxeF6gQaoh_ZxT2aL-sNv1fj13D1LApch1Y1rNwGMJl7X7hUcusR5Kk8lzr3BcItnjnvctmGj43GShP4y6mNxl2JOIbUW86ptw07M-GHKW3H_8X2SfUzEeLarj8MR7dl2Lql9VZnOjYYNpywLi1iYwuqxNfKIxjDSbabhSuDAxiXf8118-nf7-AF2nF5etw_2YIlTmi-fsG2Da3J6Kd7g0eeiX4bbIvBzXOb8x9JyBKW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Optimal+Time+Series+Combination+and+Pre-Processing+by+Smart+Joins&rft.jtitle=Applied+sciences&rft.au=Gil%2C+Amaia&rft.au=Quartulli%2C+Marco&rft.au=Olaizola%2C+Igor+G.&rft.au=Sierra%2C+Basilio&rft.date=2020-09-01&rft.issn=2076-3417&rft.eissn=2076-3417&rft.volume=10&rft.issue=18&rft.spage=6346&rft_id=info:doi/10.3390%2Fapp10186346&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_app10186346
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon