Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure

Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Congress on Big Data pp. 339 - 346
Main Authors Dingsheng Wan, Yan Xiao, Pengcheng Zhang, Jun Feng, Yuelong Zhu, Qian Liu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2014
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy.
AbstractList Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy.
Author Yuelong Zhu
Jun Feng
Qian Liu
Pengcheng Zhang
Dingsheng Wan
Yan Xiao
Author_xml – sequence: 1
  surname: Dingsheng Wan
  fullname: Dingsheng Wan
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 2
  surname: Yan Xiao
  fullname: Yan Xiao
  email: hhu_xiaoyan@163.com
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 3
  surname: Pengcheng Zhang
  fullname: Pengcheng Zhang
  email: pchzhang@hhu.edu.cn
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 4
  surname: Jun Feng
  fullname: Jun Feng
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 5
  surname: Yuelong Zhu
  fullname: Yuelong Zhu
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 6
  surname: Qian Liu
  fullname: Qian Liu
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
BookMark eNotj8tKw0AUQEeoYFv7BW7mBxLnPZllH2qFFhep63KT3ISBZEYycRG_3oKuDmdz4KzIIsSAhFDOcs6Ze9757gAT5PsYuhFTygXjKtfmjqy4ss5ppo1akKWQ1mXWMvlANin5igljjdJWLkl5nJsx9rHzNfT04gekJY4eE92GOEA_07MPPnR0BwkbGgMt56GKvf-Byd8MQkMPPk0QaqRnhPQ94iO5b6FPuPnnmny-vlz2x-z08fa-354yL1gxZU6jbdGhrDSaCnmrjIPKQCu1qhkwzsByAKibllfCOGyYKJw1onaiKLiRa_L01_WIeP0a_QDjfDWOmdu7_AUG8VZ3
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BigData.Congress.2014.56
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479950564
9781479950577
9781479950560
1479950572
EndPage 346
ExternalDocumentID 6906799
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i208t-95e7fe9e3b5e6be1f469ab6af354c0a010a71aaacdf1b269ed0289762c9288163
IEDL.DBID RIE
ISSN 2379-7703
IngestDate Wed Jun 26 19:23:52 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-95e7fe9e3b5e6be1f469ab6af354c0a010a71aaacdf1b269ed0289762c9288163
PageCount 8
ParticipantIDs ieee_primary_6906799
PublicationCentury 2000
PublicationDate 20140601
PublicationDateYYYYMMDD 2014-06-01
PublicationDate_xml – month: 06
  year: 2014
  text: 20140601
  day: 01
PublicationDecade 2010
PublicationTitle 2014 IEEE International Congress on Big Data
PublicationTitleAbbrev bigdatacongress
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026764573
ssj0003203847
Score 1.60929
Snippet Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge...
SourceID ieee
SourceType Publisher
StartPage 339
SubjectTerms Accuracy
Big data
Data compression
Data mining
Distance Measure
Euclidean distance
Hydrological Time Series
Pattern Representation
Time measurement
Time series analysis
Title Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure
URI https://ieeexplore.ieee.org/document/6906799
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkwFWsS3PDCSNIljOx5pC6qQgpBKpW6V7Tiook1QSYfy6_HloyDEwBZlSBzbuXt3vncPoRvhS6pSArGJjVVDi8EdwaR2lFZU8gg6QAHBOX5ik1n4OKfzFrrdc2GMMWXxmXHhsjzLT3K9hVTZAJrqciHaqB15QcXVavZOwDgLae2KwQqTwCNRqS8WEC4siPRIU8jjicFw-TqWhXRHeVaGtVDjFbqgYv1DY6V0MQ9dFDeDqypL3txtoVz9-atv439Hf4j632Q-_Lx3U0eoZbJj1G3UHHD9c_fQdLJLNo0pxEANwZA6Mx_4LsvXcrXDcSkmgYfW8SU4z_B0t1b5qiZyYpkleAxoFF4YV6nHPpo93L-MJk4tueAsAy8qHEENT40wRFHDlPFTGz1LxWRKaKg9aYM3yX0ppU5SXwVMmAROKq1B1SKIIovtTlAnyzNzirB9BrdoI7Sf7IecBIKmzOINu_ge19a0naEezNHiveqqsain5_zv2xfoAJaoKtK6RJ1iszVXFg4U6rrcB1-Q3LIN
link.rule.ids 310,311,786,790,795,796,802,23958,23959,25170,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_Doxrau7XoUkKAyYgIk3EjbdYYIm8FxwL_edj_UGA_elh22ru3e-97r-94HwA1zORYxMrGJjlV9jcEtRri0hBSY08B0gDIE53BEBlP_cYZnNXD7xYVRSuXFZ8o2l_lZfpTKjUmVtU1TXcrYDtjVft5hBVur2j0eocTHpTM2dhh5DgpyhTEPUaZhpIOqUh6HtTuLlx7PuN1NkzywNVVevm10rH-orOROpt8AYTW8orbk1d5kwpYfvzo3_nf8B6D1TeeDz1-O6hDUVHIEGpWeAyx_7yYYD7bRujKG0JBDoEmeqXd4l6QrvtzCMJeTgB3t-iKYJnC8XYl0WVI5IU8i2DN41LwwLJKPLTDt30-6A6sUXbAWnhNkFsOKxoopJLAiQrmxjp-5IDxG2JcO1-Ebpy7nXEaxKzzCVGTOKrVJlcwLAo3ujkE9SRN1AqB-BtV4w9ef7PoUeQzHRCMOvfwOldq4nYKmmaP5W9FXY15Oz9nft6_B3mASDufDh9HTOdg3y1WUbF2AerbeqEsNDjJxle-JTycRtWM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2014+IEEE+International+Congress+on+Big+Data&rft.atitle=Hydrological+Time+Series+Anomaly+Mining+Based+on+Symbolization+and+Distance+Measure&rft.au=Dingsheng+Wan&rft.au=Yan+Xiao&rft.au=Pengcheng+Zhang&rft.au=Jun+Feng&rft.date=2014-06-01&rft.pub=IEEE&rft.issn=2379-7703&rft.spage=339&rft.epage=346&rft_id=info:doi/10.1109%2FBigData.Congress.2014.56&rft.externalDocID=6906799
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2379-7703&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2379-7703&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2379-7703&client=summon