DistAD: Software Anomaly Detection Based on Execution Trace Distribution

Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding catastrophic software failures and improving software runtime...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Kong, Shiyi, Ai, Jun, Lu, Minyan, Wang, Shuguang, Wong, W Eric
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 26.04.2022
Subjects	Anomalies Complex systems Control equipment Fault detection Fault tolerance Instruments Neural networks Recurrent neural networks Run time (computers) Software Software engineering
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to identify the manifestation of faults (anomalies) before they ultimately lead to unavoidable failures, thus, supporting the following runtime fault-tolerant techniques. In this work, we propose a novel anomaly detection method named DistAD, which is based on the distribution of software runtime dynamic execution traces. Unlike other existing works using key performance indicators, the execution trace is collected during runtime via intrusive instrumentation. Instrumentation are controlled following a sampling mechanism to avoid excessive overheads. Bi-directional Long Short-Term Memory (Bi-LSTM), an architecture of Recurrent Neural Network (RNN) is used to achieve the anomaly detection. The whole framework is constructed under a One-Class Neural Network (OCNN) learning mode which can help eliminate the limits of lacking for enough labeled samples and the data imbalance issues. A series of controlled experiments are conducted on a widely used database system named Cassandra to prove the validity and feasibility of the proposed method. Overheads brought about by the intrusive probing are also evaluated. The results show that DistAD can achieve more than 70% accuracy and 90% recall (in normal states) with no more than 2 times overheads compared with unmonitored executions.
ISSN:	2331-8422