IPMOD: An efficient outlier detection model for high-dimensional medical data streams

Outlier detection in high-dimensional medical data streams in real-time is critical and challenging research, which is of great help to disease prevention and source analysis. Although academia has done a lot of research on outlier detection of time series data streams, these methods have the follow...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 191; p. 116212
Main Authors Yang, Yun, Fan, ChongJun, Chen, Liang, Xiong, HongLin
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.04.2022
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Outlier detection in high-dimensional medical data streams in real-time is critical and challenging research, which is of great help to disease prevention and source analysis. Although academia has done a lot of research on outlier detection of time series data streams, these methods have the following two shortcomings: (1) Insufficient detection accuracy on high-dimensional data streams; (2) Insufficient accuracy in dynamic data streams scenarios low. To this end, we propose a sliding window model based on efficient pruning and information entropy, namely IPMOD(Information Entropy-Pruning Multi-dimensional Outlier Detection). In IPMOD, we first designed a new index weight measurement method combined with information entropy to quantify the weight of different indexes in multi-dimensional data, to determine the influence of different attributes on the prediction results. Then we designed a new sliding window and sub-sequence measurement mechanism to judge whether the data is abnormal based on the distance between the target sequence and the non-self-match. After that, we designed a pruning strategy to further reduce the computational complexity of the algorithm. The final comprehensive experiment shows that our proposed scheme not only has higher detection accuracy than many current schemes on multiple sets of real data-sets but also can quickly detect outliers in different medical data streams in real-time. •We designed a new real-time outlier detection algorithm for multi-dimensional medical data streams.•The anomaly detection algorithm proposed in this paper does not need to consider various prior information of the data stream (such as feature distribution or other density information).•The weighting scheme designed in this paper for high-dimensional data streams can accurately distinguish the impact of different attributes on detection accuracy.•We have designed a new pruning strategy that can greatly reduce the time consumption of the algorithm.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.116212