Multi-failure detection using device hierarchical attention network
With rapid developments in the information industry, data centers have become increasingly important for collecting and storing data. The devices in data centers are not only connected to external machines to provide a variety of services, but they also store vast amounts of data, as device failures...
Saved in:
Published in | Expert systems with applications Vol. 203; p. 117277 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With rapid developments in the information industry, data centers have become increasingly important for collecting and storing data. The devices in data centers are not only connected to external machines to provide a variety of services, but they also store vast amounts of data, as device failures in data centers can result in fatal and heavy economic damage. Various methods have been studied in recent years to effectively predict failures in connected devices. However, in data center-scale systems, there is a problem of low frequency of failure when predicting the failure for each device. In addition, there are complex failures that may occur within the data center owing to a mix of devices and systems, and it is difficult to determine the cause of failure in such cases. In this study, we present a device hierarchical attention network (DHAN) methodology that can predict all device failures by simultaneously using existing device information regarding the devices in the data center. Because the devices in the data center could potentially affect each other, this information regarding the device is used in a composite manner. When using information from a single device, it was observed that failure could be predicted more effectively compared to the results obtained from failure prediction. In addition, by extracting attention information from the DHAN model, we identified a device that plays an important role in predicting the failure of a particular device. Thereafter, we utilized it to cluster and reconstruct the DHAN model and identify the results of predicting failures more effectively. Based on the results presented herein, it is expected that the proposed system can be stably maintained and repaired by identifying the potential impact of the devices.
•A network model is proposed to predict multi-device failure in a data center.•A helpful and relevant device subset can be obtained to predict failure.•Our model has excellent performance due to use of relevant device information. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2022.117277 |