Research on FCM-LR cross electricity theft detection based on big data user profile
Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has re...
Saved in:
Published in | International journal of system assurance engineering and management Vol. 15; no. 7; pp. 3251 - 3265 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New Delhi
Springer India
01.07.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0975-6809 0976-4348 |
DOI: | 10.1007/s13198-024-02333-8 |