Active learning-assisted semi-supervised learning for fault detection and diagnostics with imbalanced dataset

Data-driven Fault Detection and Diagnostics (FDD) methods often assume that sufficient labeled samples are class-balanced and faulty classes in testing are precedent or seen previously during model training. When monitoring a large fleet of assets at scale, these assumptions may be violated: (I) onl...

Full description

Saved in:

Bibliographic Details
Published in	IIE transactions Vol. 55; no. 7; pp. 672 - 686
Main Authors	Peng, Xiaomeng, Jin, Xiaoning, Duan, Shiming, Sankavaram, Chaitanya
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 03.07.2023 Taylor & Francis Ltd
Subjects	Active learning Air intakes Algorithms Datasets Failure modes Fault detection fault detection and diagnostics imbalanced data Intake systems Labels Machine learning Multiple criterion Prediction models Semi-supervised learning Synthetic data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data-driven Fault Detection and Diagnostics (FDD) methods often assume that sufficient labeled samples are class-balanced and faulty classes in testing are precedent or seen previously during model training. When monitoring a large fleet of assets at scale, these assumptions may be violated: (I) only a limited number of samples can be manually labeled due to constraints of time and/or cost; (II) most of the samples collected in the engineering systems are under normal conditions, leading to a highly imbalanced class distribution and a biased prediction model. This work presents a robust and cost-effective FDD framework that integrates active learning and semi-supervised learning methods to detect both known and unknown failure modes iteratively. This framework allows to strategically select the samples to be annotated from a fully unlabeled dataset, while labeling cost is minimal. Specifically, a novel graph-based semi-supervised classifier with adaptive graph construction is developed to predict labels with imbalanced data and detect novel classes. We designed a multi-criteria active learning sampling strategy to select the most informative samples from unlabeled data in order to query minimal number of labels for classification. We tested the framework and algorithms in three synthetic datasets and one real-world dataset of vehicle air intake systems, and demonstrated the superior performance compared to the state-of-the-art methods for fleet-level FDD.
ISSN:	2472-5854 2472-5862
DOI:	10.1080/24725854.2022.2074579