Fault Detection and Localization in Distributed Systems Using Recurrent Convolutional Neural Networks

Early detection of faults is essential to maintaining the reliability of a distributed system. While there are many solutions for detecting faults, handling high dimensionality and uncertainty of system observations to make an accurate detection still remains a challenge. In this paper, we address t...

Full description

Saved in:
Bibliographic Details
Published inAdvanced Data Mining and Applications pp. 33 - 48
Main Authors Qi, Guangyang, Yao, Lina, Uzunov, Anton V.
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Early detection of faults is essential to maintaining the reliability of a distributed system. While there are many solutions for detecting faults, handling high dimensionality and uncertainty of system observations to make an accurate detection still remains a challenge. In this paper, we address this challenge with a two-dimensional convolutional neural network in the form of a denoising autoencoder with recurrent neural networks that performs simultaneous fault detection and diagnosis based on real-time system metrics from a given distributed system (e.g. CPU usage, memory consumption, etc.). The model provides a unified way to automatically learn useful features and make adaptive inferences regarding the onset of faults without hand-crafted feature extraction and human diagnostic expertise. In addition, we develop a Bayesian change-point detection approach for fault localization, in order to support the fault recovery process. We conducted extensive experiments in a real distributed environment over Amazon EC2 and the results demonstrate our proposal outperforms a variety of state-of-the-art machine learning algorithms that are used for fault detection and diagnosis in distributed systems.
ISBN:9783319691787
3319691783
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-69179-4_3