Cleaning Noises from Time Series Data with Memory Effects

The development process of deep learning is an iterative task that requires a lot of manual work. Among the steps in the development process, pre-processing of learning data is a very costly task, and is a step that significantly affects the learning results. In the early days of AI's algorithm...

Full description

Saved in:

Bibliographic Details
Published in	韓國컴퓨터情報學會論文誌 Vol. 25; no. 4; pp. 37 - 45
Main Authors	Cho, Jae-Han, Lee, Lee-Sub
Format	Journal Article
Language	Korean
Published	2020
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The development process of deep learning is an iterative task that requires a lot of manual work. Among the steps in the development process, pre-processing of learning data is a very costly task, and is a step that significantly affects the learning results. In the early days of AI's algorithm research, learning data in the form of public DB provided mainly by data scientists were used. The learning data collected in the real environment is mostly the operational data of the sensors and inevitably contains various noises. Accordingly, various data cleaning frameworks and methods for removing noises have been studied. In this paper, we proposed a method for detecting and removing noises from time-series data, such as sensor data, that can occur in the IoT environment. In this method, the linear regression method is used so that the system repeatedly finds noises and provides data that can replace them to clean the learning data. In order to verify the effectiveness of the proposed method, a simulation method was proposed, and a method of determining factors for obtaining optimal cleaning results was proposed. 딥러닝의 개발 프로세스는 대량의 수작업이 요구되는 반복적인 작업으로 그 중 학습 데이터 전처리는 매우 큰 비용이 요구되며 학습 결과에 중요한 영향을 주는 단계이다. AI의 알고리즘 연구 초기에는 주로 데이터 과학자들에 의해 완벽하게 정리하여 제공된 공개 DB형태의 학습데이터를 주로 사용하였다. 실제 환경에서 수집된 학습 데이터는 주로 센서들의 운영 데이터이며 필연적으로 노이즈가 많이 발생할 수 있다. 따라서 노이즈를 제거하기 위한 다양한 데이터 클리닝 프레임워크와 방법들이 연구되었다. 본 논문에서는 IoT환경에서 발생 될 수 있는 센서 데이터와 같은 시계열 데이터에서 노이즈를 감지하고 제거하는 방법을 제안하였다. 이 방법은 선형회귀 방법을 사용하여 시스템이 반복적으로 노이즈를 찾아내고, 이를 대체할 수 있는 데이터를 제공하여 학습데이터를 클리닝한다. 제안된 방법의 효과를 검증하기 위해서 본 연구에서 시뮬레이션을 수행하여, 최적의 클리닝 결과를 얻을 수 있는 인자들의 결정 방법을 확인하였다.
Bibliography:	KISTI1.1003/JNL.JAKO202013363975560
ISSN:	1598-849X 2383-9945