Anomaly Detection from System Tracing Data Using Multimodal Deep Learning

The concept of Artificial Intelligence for IT Operations (AIOps) combines big data and machine learning methods to replace a broad range of IT operations including availability and performance monitoring of services. Such platforms typically use separate models for each modality of monitoring data (...

Full description

Saved in:
Bibliographic Details
Published inIEEE ... International Conference on Cloud Computing pp. 179 - 186
Main Authors Nedelkoski, Sasho, Cardoso, Jorge, Kao, Odej
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The concept of Artificial Intelligence for IT Operations (AIOps) combines big data and machine learning methods to replace a broad range of IT operations including availability and performance monitoring of services. Such platforms typically use separate models for each modality of monitoring data (e.g., textual properties and real-valued response time in logs and traces) to detect faults and upcoming anomalies in cloud services, which do not capture the existing correlation between the modalities. This paper extends the range of utilized data types for creation of a single model to improve the anomaly detection. We use a bimodal distributed tracing data from large cloud infrastructures in order to detect an anomaly in the execution of system components. We propose an anomaly detection method, which utilizes a single modality of the data with information about the trace structure. In the next step, we extend the single-modality neural architecture to a multimodal neural network with long short-term memory (LSTM) to enable the learning from the sequential nature of both modalities in the tracing data. Furthermore, we demonstrate an approach to detect dependent and concurrent events using the ability of the model to reconstruct the execution path. The implemented prototype is experimentally evaluated with data from a large-scale production cloud. The results demonstrate that the novel approaches outperform other deep-learning methods based on traditional architectures.
ISSN:2159-6190
DOI:10.1109/CLOUD.2019.00038