Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure

Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE International Congress on Big Data pp. 339 - 346
Main Authors	Dingsheng Wan, Yan Xiao, Pengcheng Zhang, Jun Feng, Yuelong Zhu, Qian Liu
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2014
Subjects	Accuracy Big data Data compression Data mining Distance Measure Euclidean distance Hydrological Time Series Pattern Representation Time measurement Time series analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy.
ISSN:	2379-7703
DOI:	10.1109/BigData.Congress.2014.56