Research on FCM-LR cross electricity theft detection based on big data user profile

Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has re...

Full description

Saved in:

Bibliographic Details
Published in	International journal of system assurance engineering and management Vol. 15; no. 7; pp. 3251 - 3265
Main Authors	Hu, Ronghui, Zhen, Tong
Format	Journal Article
Language	English
Published	New Delhi Springer India 01.07.2024 Springer Nature B.V
Subjects	Algorithms Big Data Data structures Deep learning Electricity Engineering Engineering Economics Logistics Machine learning Marketing Organization Original Article Quality Control Real time Reliability Safety and Risk Sampling Statistical analysis Theft Fuzzy c-means and logistic regression cross detection (FCM-LR) User profile Imbalance Electricity theft detection (ETD)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0975-6809 0976-4348
DOI:	10.1007/s13198-024-02333-8