Improving Performance of Log Anomaly Detection With Semantic and Time Features Based on BiLSTM-Attention

In recent years, with the increase in the scale and complexity of system software, how to effectively capture, analyze and locate the abnormal behavior generated during system operation has increasingly become a recognized problem in the software testing field. Traditional white-box-based anomaly de...

Full description

Saved in:
Bibliographic Details
Published in2021 2nd International Conference on Electronics, Communications and Information Technology (CECIT) pp. 661 - 666
Main Authors Li, Xinqiang, Niu, Weina, Zhang, Xiaosong, Zhang, Runzi, Yu, Zhenqi, Li, Zimu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, with the increase in the scale and complexity of system software, how to effectively capture, analyze and locate the abnormal behavior generated during system operation has increasingly become a recognized problem in the software testing field. Traditional white-box-based anomaly detection methods need to provide system source code and cannot effectively detect abnormal behaviors that occur when the program is running. Black-box-based anomaly detection methods rely on test cases and have low code coverage. Anomaly detection based on the logs generated during system operation can alleviate the abovementioned problems. However, the existing system log-based abnormality detection method mainly extracts log template characteristics, and cannot effectively determine the logical and temporal abnormalities related to the log. Therefore, in order to detect anomalies in the log sequence more comprehensively, this paper proposes an anomaly detection method based on BiLSTM. This method combines the semantic and temporal characteristics of the log for modeling. In terms of semantics, the Bert natural language processing model is used to extract the contextual semantic information of the logs; in terms of time, the log time interval characteristics are extracted based on the three potential relationships of the logs. In addition, the proposed method also adds an attention mechanism to balance feature weights. Finally, we evaluate our method in experiments on two real datasets, HDFS and OpenStack. The experimental results show that our method has a great improvement in accuracy and recall.
DOI:10.1109/CECIT53797.2021.00121