Disk Failure Early Warning Based on the Characteristics of Customized SMART

Today, with the deep popularization of the Internet, continuous development of 5G, cloud and artificial intelligence, the total global data volume is increasing explosively. With more and more data stored in the data center, traditional hard drives are still hosting large amounts of data, and the si...

Full description

Saved in:
Bibliographic Details
Published in2020 19th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm) pp. 1282 - 1288
Main Authors Zhao, Jian, He, Yongzhan, Liu, Hongmei, Zhang, Jiajun, Liu, Bin, Zhang, Jun, Lv, Wenqing, Zhou, Alex, Jiang, Feng, Liu, Jing, Nishi, Ahujia
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Today, with the deep popularization of the Internet, continuous development of 5G, cloud and artificial intelligence, the total global data volume is increasing explosively. With more and more data stored in the data center, traditional hard drives are still hosting large amounts of data, and the single-drive capacity is increasing with an average annual rate of more than 10%, so the availability of hard drives is increasingly impacting data security. According to statistics, hard disk failure rate is more than 50% in the whole server failure accounted, the data center has to sacrifice disk performance and time to recover data continuously. There are huge problems with traditional SMART-based fault monitoring in the fault alarm aging, coverage, accuracy, it can not be avoided in advance. Disk failure early warning systems based on disk customized SMART features are designed to solve these problems. It customized the status information, error statistics, environmental information, reliability information, etc. for the basic components related to disk, disc, motor, etc., and trained the hard disk characteristics of fault classes and normal classes by analyzing the statistics and clustering of various factors, and using the machine learning method strains related to the decision tree. Gradually establish a fault prediction model. The fault prediction model can handle the failed hard drive in advance, data backup and migration timely, so as to avoid failure and data loss, to protect the data security in the data center. The results show there is strong correlation with hard disk failure for the error rate of hard disk, reallocate sector, command timeout and so on, and the accuracy of the model in disk failure prediction can reach more than 98%.
ISSN:2577-0799
DOI:10.1109/ITherm45881.2020.9190324