Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure
Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data...
Saved in:
Published in | 2014 IEEE International Congress on Big Data pp. 339 - 346 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy. |
---|---|
AbstractList | Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy. |
Author | Yuelong Zhu Jun Feng Qian Liu Pengcheng Zhang Dingsheng Wan Yan Xiao |
Author_xml | – sequence: 1 surname: Dingsheng Wan fullname: Dingsheng Wan organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China – sequence: 2 surname: Yan Xiao fullname: Yan Xiao email: hhu_xiaoyan@163.com organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China – sequence: 3 surname: Pengcheng Zhang fullname: Pengcheng Zhang email: pchzhang@hhu.edu.cn organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China – sequence: 4 surname: Jun Feng fullname: Jun Feng organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China – sequence: 5 surname: Yuelong Zhu fullname: Yuelong Zhu organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China – sequence: 6 surname: Qian Liu fullname: Qian Liu organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China |
BookMark | eNotj8tKw0AUQEeoYFv7BW7mBxLnPZllH2qFFhep63KT3ISBZEYycRG_3oKuDmdz4KzIIsSAhFDOcs6Ze9757gAT5PsYuhFTygXjKtfmjqy4ss5ppo1akKWQ1mXWMvlANin5igljjdJWLkl5nJsx9rHzNfT04gekJY4eE92GOEA_07MPPnR0BwkbGgMt56GKvf-Byd8MQkMPPk0QaqRnhPQ94iO5b6FPuPnnmny-vlz2x-z08fa-354yL1gxZU6jbdGhrDSaCnmrjIPKQCu1qhkwzsByAKibllfCOGyYKJw1onaiKLiRa_L01_WIeP0a_QDjfDWOmdu7_AUG8VZ3 |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/BigData.Congress.2014.56 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 1479950564 9781479950577 9781479950560 1479950572 |
EndPage | 346 |
ExternalDocumentID | 6906799 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IL 6IN ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
ID | FETCH-LOGICAL-i208t-95e7fe9e3b5e6be1f469ab6af354c0a010a71aaacdf1b269ed0289762c9288163 |
IEDL.DBID | RIE |
ISSN | 2379-7703 |
IngestDate | Wed Jun 26 19:23:52 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i208t-95e7fe9e3b5e6be1f469ab6af354c0a010a71aaacdf1b269ed0289762c9288163 |
PageCount | 8 |
ParticipantIDs | ieee_primary_6906799 |
PublicationCentury | 2000 |
PublicationDate | 20140601 |
PublicationDateYYYYMMDD | 2014-06-01 |
PublicationDate_xml | – month: 06 year: 2014 text: 20140601 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | 2014 IEEE International Congress on Big Data |
PublicationTitleAbbrev | bigdatacongress |
PublicationYear | 2014 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib026764573 ssj0003203847 |
Score | 1.60929 |
Snippet | Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 339 |
SubjectTerms | Accuracy Big data Data compression Data mining Distance Measure Euclidean distance Hydrological Time Series Pattern Representation Time measurement Time series analysis |
Title | Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure |
URI | https://ieeexplore.ieee.org/document/6906799 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkwFWsS3PDCSNIljOx5pC6qQgpBKpW6V7Tiook1QSYfy6_HloyDEwBZlSBzbuXt3vncPoRvhS6pSArGJjVVDi8EdwaR2lFZU8gg6QAHBOX5ik1n4OKfzFrrdc2GMMWXxmXHhsjzLT3K9hVTZAJrqciHaqB15QcXVavZOwDgLae2KwQqTwCNRqS8WEC4siPRIU8jjicFw-TqWhXRHeVaGtVDjFbqgYv1DY6V0MQ9dFDeDqypL3txtoVz9-atv439Hf4j632Q-_Lx3U0eoZbJj1G3UHHD9c_fQdLJLNo0pxEANwZA6Mx_4LsvXcrXDcSkmgYfW8SU4z_B0t1b5qiZyYpkleAxoFF4YV6nHPpo93L-MJk4tueAsAy8qHEENT40wRFHDlPFTGz1LxWRKaKg9aYM3yX0ppU5SXwVMmAROKq1B1SKIIovtTlAnyzNzirB9BrdoI7Sf7IecBIKmzOINu_ge19a0naEezNHiveqqsain5_zv2xfoAJaoKtK6RJ1iszVXFg4U6rrcB1-Q3LIN |
link.rule.ids | 310,311,786,790,795,796,802,23958,23959,25170,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_Doxrau7XoUkKAyYgIk3EjbdYYIm8FxwL_edj_UGA_elh22ru3e-97r-94HwA1zORYxMrGJjlV9jcEtRri0hBSY08B0gDIE53BEBlP_cYZnNXD7xYVRSuXFZ8o2l_lZfpTKjUmVtU1TXcrYDtjVft5hBVur2j0eocTHpTM2dhh5DgpyhTEPUaZhpIOqUh6HtTuLlx7PuN1NkzywNVVevm10rH-orOROpt8AYTW8orbk1d5kwpYfvzo3_nf8B6D1TeeDz1-O6hDUVHIEGpWeAyx_7yYYD7bRujKG0JBDoEmeqXd4l6QrvtzCMJeTgB3t-iKYJnC8XYl0WVI5IU8i2DN41LwwLJKPLTDt30-6A6sUXbAWnhNkFsOKxoopJLAiQrmxjp-5IDxG2JcO1-Ebpy7nXEaxKzzCVGTOKrVJlcwLAo3ujkE9SRN1AqB-BtV4w9ef7PoUeQzHRCMOvfwOldq4nYKmmaP5W9FXY15Oz9nft6_B3mASDufDh9HTOdg3y1WUbF2AerbeqEsNDjJxle-JTycRtWM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2014+IEEE+International+Congress+on+Big+Data&rft.atitle=Hydrological+Time+Series+Anomaly+Mining+Based+on+Symbolization+and+Distance+Measure&rft.au=Dingsheng+Wan&rft.au=Yan+Xiao&rft.au=Pengcheng+Zhang&rft.au=Jun+Feng&rft.date=2014-06-01&rft.pub=IEEE&rft.issn=2379-7703&rft.spage=339&rft.epage=346&rft_id=info:doi/10.1109%2FBigData.Congress.2014.56&rft.externalDocID=6906799 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2379-7703&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2379-7703&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2379-7703&client=summon |