Research on FCM-LR cross electricity theft detection based on big data user profile

Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has re...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of system assurance engineering and management Vol. 15; no. 7; pp. 3251 - 3265
Main Authors Hu, Ronghui, Zhen, Tong
Format Journal Article
LanguageEnglish
Published New Delhi Springer India 01.07.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0975-6809
0976-4348
DOI:10.1007/s13198-024-02333-8