Learning Optimal Time Series Combination and Pre-Processing by Smart Joins
In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes...
Saved in:
Published in | Applied sciences Vol. 10; no. 18; p. 6346 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.09.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process. |
---|---|
AbstractList | In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process. |
Author | Quartulli, Marco Olaizola, Igor G. Sierra, Basilio Gil, Amaia |
Author_xml | – sequence: 1 givenname: Amaia surname: Gil fullname: Gil, Amaia – sequence: 2 givenname: Marco orcidid: 0000-0001-5735-2072 surname: Quartulli fullname: Quartulli, Marco – sequence: 3 givenname: Igor G. orcidid: 0000-0002-9965-2038 surname: Olaizola fullname: Olaizola, Igor G. – sequence: 4 givenname: Basilio orcidid: 0000-0001-8062-9332 surname: Sierra fullname: Sierra, Basilio |
BookMark | eNptUU1rAjEUDMVCrfXUP7DQY9k2L8km7rFIPxRBQXsO2SQrEU22yXrw33fVFqT0Xd7jMTMMM7eo54O3CN0DfqK0xM-qaQDDiFPGr1CfYMFzykD0Lu4bNExpg7spgY4A99F0ZlX0zq-zedO6ndpmK7ez2dJGZ1M2DrvKedW64DPlTbaINl_EoG1KR0p1yJY7FdtsGpxPd-i6Vttkhz97gD7fXlfjj3w2f5-MX2a5ppy1ecXqghRGG6UBgKjaAIHS6gI4ptxQoYqKlVrQghGDSTliQjGCSWUY4bQu6QBNzromqI1sYuc6HmRQTp4eIa5l58nprZWmNpWoQAtOLDOWl6BNyUZYVKoWTPBO6-Gs1cTwtbeplZuwj76zLwljlBzzhA4FZ5SOIaVoa6lde0qljcptJWB5bEBeNNBxHv9wfp3-h_4Gyo6G9A |
CitedBy_id | crossref_primary_10_3390_app11094132 crossref_primary_10_1007_s12145_023_00976_y crossref_primary_10_1155_2022_7400833 |
Cites_doi | 10.1007/s00521-017-3049-x 10.1109/ACCESS.2015.2508940 10.1016/j.patcog.2017.08.015 10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2 10.1016/j.patcog.2018.04.003 10.1002/aic.690450106 10.1007/s00170-010-3094-4 10.1021/ie00035a025 10.1016/j.energy.2016.03.051 10.1016/j.neucom.2016.07.050 10.1016/j.ijdrr.2020.101587 10.1016/j.apenergy.2018.11.034 |
ContentType | Journal Article |
Copyright | 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION ABUWG AFKRA AZQEC BENPR CCPQU DWQXO PHGZM PHGZT PIMPY PKEHL PQEST PQQKQ PQUKI PRINS DOA |
DOI | 10.3390/app10186346 |
DatabaseName | CrossRef ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College ProQuest Central ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China DOAJ Open Access Full Text |
DatabaseTitle | CrossRef Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | Publicly Available Content Database CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Sciences (General) |
EISSN | 2076-3417 |
ExternalDocumentID | oai_doaj_org_article_dfdb7b1c762e4de691cd94807baf7476 10_3390_app10186346 |
GroupedDBID | .4S 2XV 5VS 7XC 8CJ 8FE 8FG 8FH AADQD AAFWJ AAYXX ADBBV ADMLS AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS APEBS ARCSS BCNDV BENPR CCPQU CITATION CZ9 D1I D1J D1K GROUPED_DOAJ IAO IGS ITC K6- K6V KC. KQ8 L6V LK5 LK8 M7R MODMG M~E OK1 P62 PHGZM PHGZT PIMPY PROAC TUS ABUWG AZQEC DWQXO PKEHL PQEST PQQKQ PQUKI PRINS PUEGO |
ID | FETCH-LOGICAL-c364t-b4f525dcdac1112afd1219ec516036d37a5b49c73542d029847a4202bd4263f93 |
IEDL.DBID | BENPR |
ISSN | 2076-3417 |
IngestDate | Wed Aug 27 01:30:57 EDT 2025 Mon Jun 30 07:26:28 EDT 2025 Thu Apr 24 23:00:19 EDT 2025 Tue Jul 01 03:14:33 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 18 |
Language | English |
License | https://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c364t-b4f525dcdac1112afd1219ec516036d37a5b49c73542d029847a4202bd4263f93 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-9965-2038 0000-0001-8062-9332 0000-0001-5735-2072 |
OpenAccessLink | https://www.proquest.com/docview/2443201861?pq-origsite=%requestingapplication% |
PQID | 2443201861 |
PQPubID | 2032433 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_dfdb7b1c762e4de691cd94807baf7476 proquest_journals_2443201861 crossref_citationtrail_10_3390_app10186346 crossref_primary_10_3390_app10186346 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2020-09-01 |
PublicationDateYYYYMMDD | 2020-09-01 |
PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Basel |
PublicationPlace_xml | – name: Basel |
PublicationTitle | Applied sciences |
PublicationYear | 2020 |
Publisher | MDPI AG |
Publisher_xml | – name: MDPI AG |
References | Folgado (ref_2) 2018; 81 Ciric (ref_11) 1994; 33 Lorenz (ref_9) 1963; 20 ref_10 ref_1 Kumar (ref_12) 1999; 45 Osuolale (ref_13) 2016; 106 Zahedi (ref_7) 2020; 48 Mirakhorli (ref_15) 2020; 8 Morel (ref_3) 2018; 74 Sun (ref_4) 2011; 55 Yang (ref_8) 2019; 235 Tehlah (ref_14) 2016; 216 Zuo (ref_6) 2015; 3 Tawhid (ref_5) 2019; 31 |
References_xml | – volume: 31 start-page: 915 year: 2019 ident: ref_5 article-title: Multi-objective sine-cosine algorithm (MO-SCA) for multi-objective engineering design problems publication-title: Neural Comput. Appl. doi: 10.1007/s00521-017-3049-x – volume: 3 start-page: 2687 year: 2015 ident: ref_6 article-title: A Multi-Objective Optimization Scheduling Method Based on the Ant Colony Algorithm in Cloud Computing publication-title: IEEE Access doi: 10.1109/ACCESS.2015.2508940 – volume: 74 start-page: 77 year: 2018 ident: ref_3 article-title: Time-series averaging using constrained dynamic time warping with tolerance publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2017.08.015 – volume: 20 start-page: 130 year: 1963 ident: ref_9 article-title: Deterministic Nonperiodic Flow publication-title: J. Atmos. Sci. doi: 10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2 – ident: ref_10 – volume: 81 start-page: 268 year: 2018 ident: ref_2 article-title: Time alignment measurement for time series publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2018.04.003 – volume: 45 start-page: 51 year: 1999 ident: ref_12 article-title: Modeling, analysis and control of ethylene glycol reactive distillation column publication-title: AIChE J. doi: 10.1002/aic.690450106 – volume: 55 start-page: 723 year: 2011 ident: ref_4 article-title: Multi-objective optimization algorithms for flow shop scheduling problem: A review and prospects publication-title: Int. J. Adv. Manuf. Technol. doi: 10.1007/s00170-010-3094-4 – volume: 33 start-page: 2738 year: 1994 ident: ref_11 article-title: Steady state multiplicities in an ethylene glycol reactive distillation column publication-title: Ind. Eng. Chem. Res. doi: 10.1021/ie00035a025 – volume: 106 start-page: 562 year: 2016 ident: ref_13 article-title: Energy efficiency optimisation for distillation column using artificial neural network models publication-title: Energy doi: 10.1016/j.energy.2016.03.051 – ident: ref_1 – volume: 216 start-page: 489 year: 2016 ident: ref_14 article-title: Artificial neural network based modeling and optimization of refined palm oil process publication-title: Neurocomputing doi: 10.1016/j.neucom.2016.07.050 – volume: 48 start-page: 101587 year: 2020 ident: ref_7 article-title: Multi-objective decision-making model for distribution planning of goods and routing of vehicles in emergency multi-objective decision-making model for distribution planning of goods and routing of vehicles in emergency publication-title: Int. J. Disaster Risk Reduct. doi: 10.1016/j.ijdrr.2020.101587 – volume: 235 start-page: 1205 year: 2019 ident: ref_8 article-title: A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting publication-title: Appl. Energy doi: 10.1016/j.apenergy.2018.11.034 – volume: 8 start-page: 105 year: 2020 ident: ref_15 article-title: Fault diagnosis in a distillation column using a support vector machine based classifier publication-title: Int. J. Smart Electr. Eng. |
SSID | ssj0000913810 |
Score | 2.17081 |
Snippet | In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the... |
SourceID | doaj proquest crossref |
SourceType | Open Website Aggregation Database Enrichment Source Index Database |
StartPage | 6346 |
SubjectTerms | Algorithms Case studies Data analysis Machine learning Methods Optimization preprocessing Time series |
SummonAdditionalLinks | – databaseName: DOAJ Open Access Full Text dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kJz2IrYrVKnvoQYVgs9nN46hiKQUfoIXewj5FsKm09eC_d2azLQEFL17DJBtmMo9vs_MNIX2ZOsWlY5GSUvqtm0hig0-suAb0bAaZwq2B-4d0NOHjqZg2Rn3hmbCaHrhW3JVxRmUq1uC0lhubFrE2BfZBK-mgFPZk25DzGmDKx-AiRuqquiEvAVyP_4ORnCpNsNRtpCDP1P8jEPvsMtwju6EspNf167TJlq06ZKdBFtgh7eCGS3oeuKIv9sk48KO-0kfw_Rk8Ans6KO55gSA4OwBfr3sqK0OfFjYKnQF4i_qizzPQAB3P36rlAZkM715uR1EYjxDpJOWrSHEnmDDaSA0Bi0lnYgg_VgucHJ2aJJNC8UJnieDMINM6zyRnA6YMkrS7IjkkrWpe2SNCRe5cYVPBjAOAJfNcqNwVAHVYZp3NRZdcrjVW6sAdjiMs3kvAEKjesqHeLulvhD9qyozfxW5Q9RsR5Ln2F8D6ZbB--Zf1u6S3NlwZnG9ZQsWSMFwlPv6PNU7INkOQ7Q-W9Uhrtfi0p1CJrNSZ_-i-Acrz2yk priority: 102 providerName: Directory of Open Access Journals |
Title | Learning Optimal Time Series Combination and Pre-Processing by Smart Joins |
URI | https://www.proquest.com/docview/2443201861 https://doaj.org/article/dfdb7b1c762e4de691cd94807baf7476 |
Volume | 10 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LTuswEB1B2cACARfEs_KCBfdKEY1jJ84KAaKgSjwEF4ld5CdCghTasuDvmUndUgnENnG8GHvO-Ew8ZwD2dR6M0IEnRmvdpG4STQU-qREW2bPrFIZSA5dX-cW96D3Ih5hwG8ZrlRNMbIDa9S3lyA8xDGUYrFSeHr2-JdQ1iv6uxhYa87CAEKxUCxZOzq5ubqdZFlK9VGlnXJiXIb-n_8IkUpVndOSdCUWNYv83QG6iTHcFluPxkB2P13MV5ny9BkszooFrsBrdccgOomb03z_Qizqpj-waMeAFp6DaDka5LxyITo8EuFkDpmvHbgY-iRUC9In5YHcvuIVYr_9UD9fhvnv2__QiiW0SEpvlYpQYESSXzjptEbi4Di5FGPJWUgfp3GWFlkaUtsik4I4U10WhBe9w40isPZTZBrTqfu03gUkVQulzyV1AoqWVkkaFEikPL3zwSm7Bv4nFKhs1xKmVxXOFXILMW82Ydwv2p4Nfx9IZPw87IdNPh5DedfOgP3isovtULjhTmNQidHvhfF6m1pVUDW90QEKEk-xOFq6KTjisvrbM9u-vd2CRE41uro7tQms0ePd7eNYYmTbMq-55O26rdsPYPwEi8NTc |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB4heigcqgJF0KatD0GiSCuyXnsfh6rqK4SQhEoFidviJ6pUNmmSquJP9Td2ZuNNI1H1xnXt9WE8D39jzzcAbZV6LZTnkVZK1ambSFGBT6yFQfRsO5mm1MBwlPYuRf9KXq3B76YWhp5VNj6xdtR2bChHfoxhKMFglafxu8mPiLpG0e1q00JjoRZn7u4XQrbZ29NPuL8HnHc_X3zsRaGrQGSSVMwjLbzk0hqrDNo5V97GaLXOSGq4nNokU1KLwmSJFNwSQbnIlOAdri1xm3siX0KX_0gkGMmpMr17sszpEMdmHncWZYA43qFbaKLEShM6YK8Evro_wD33X8e07lN4Eg6j7P1Ce7ZgzVXbsLlCUbgNW8H4Z-wwMFS_2YF-YGW9YefocW5xCaokYZRpw4noYhBu1zvOVGXZl6mLQj0C_aLv2NdbVFjWH3-rZs_g8kHEtwvr1bhye8Bk7n3hUsmtR1in8lzq3BcIsHjmvMvlPhw1EitNYCynxhnfS0QuJN5yRbz70F5OniyIOv497QOJfjmF2LXrD-PpTRmMtbTe6kzHBgOFE9alRWxsQbX3WnmEX7hIq9m4Mpj8rPyroM__P_waHvcuhoNycDo6ewEbnAB8_WitBevz6U_3Ek85c_2qVi0G1w-ty38ARegNBA |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB6hIFVwqAoUEaCtDyC1lVZkvfY-DlVVChEEGiIeErfFT1QJNjRJVfHX-us6s_GmkVpx47o79mE8nvE39nwDsKNSr4XyPNJKqTp1Eykq8Im1MIiebSfTlBr41k-PrkTvWl4vwO-mFoaeVTY-sXbUdmgoR76HYSjBYJWn8Z4PzyIGB93PDz8i6iBFN61NO42piZy4x18I38afjg9wrXc57x5efj2KQoeByCSpmERaeMmlNVYZ3PNceRvjDnZGUvPl1CaZkloUJkuk4JbIykWmBO9wbYnn3BMRE7r_xYxQUQsW9w_7g_NZhocYN_O4My0KTJKiQ3fSRJCVJnTcnguDdbeAf4JBHeG6r-BlOJqyL1NbWoEFV63C8hxh4SqsBFcwZu8DX_WHNegFjtZbdob-5x6noLoSRnk3FESHg-C7Xn-mKssGIxeF6gQaoh_ZxT2aL-sNv1fj13D1LApch1Y1rNwGMJl7X7hUcusR5Kk8lzr3BcItnjnvctmGj43GShP4y6mNxl2JOIbUW86ptw07M-GHKW3H_8X2SfUzEeLarj8MR7dl2Lql9VZnOjYYNpywLi1iYwuqxNfKIxjDSbabhSuDAxiXf8118-nf7-AF2nF5etw_2YIlTmi-fsG2Da3J6Kd7g0eeiX4bbIvBzXOb8x9JyBKW |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Optimal+Time+Series+Combination+and+Pre-Processing+by+Smart+Joins&rft.jtitle=Applied+sciences&rft.au=Gil%2C+Amaia&rft.au=Quartulli%2C+Marco&rft.au=Olaizola%2C+Igor+G.&rft.au=Sierra%2C+Basilio&rft.date=2020-09-01&rft.issn=2076-3417&rft.eissn=2076-3417&rft.volume=10&rft.issue=18&rft.spage=6346&rft_id=info:doi/10.3390%2Fapp10186346&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_app10186346 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon |