DistAD: Software Anomaly Detection Based on Execution Trace Distribution
Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding catastrophic software failures and improving software runtime...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.02.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Modern software systems have become increasingly complex, which makes them
difficult to test and validate. Detecting software partial anomalies in complex
systems at runtime can assist with handling unintended software behaviors,
avoiding catastrophic software failures and improving software runtime
availability. These detection techniques aim to identify the manifestation of
faults (anomalies) before they ultimately lead to unavoidable failures, thus,
supporting the following runtime fault-tolerant techniques. In this work, we
propose a novel anomaly detection method named DistAD, which is based on the
distribution of software runtime dynamic execution traces. Unlike other
existing works using key performance indicators, the execution trace is
collected during runtime via intrusive instrumentation. Instrumentation are
controlled following a sampling mechanism to avoid excessive overheads.
Bi-directional Long Short-Term Memory (Bi-LSTM), an architecture of Recurrent
Neural Network (RNN) is used to achieve the anomaly detection. The whole
framework is constructed under a One-Class Neural Network (OCNN) learning mode
which can help eliminate the limits of lacking for enough labeled samples and
the data imbalance issues. A series of controlled experiments are conducted on
a widely used database system named Cassandra to prove the validity and
feasibility of the proposed method. Overheads brought about by the intrusive
probing are also evaluated. The results show that DistAD can achieve more than
70% accuracy and 90% recall (in normal states) with no more than 2 times
overheads compared with unmonitored executions. |
---|---|
DOI: | 10.48550/arxiv.2202.13898 |