Fully Unsupervised Anomaly Detection in Industrial Images with Unknown Data Contamination

AI algorithms for the automatic detection of unusual or abnormal patterns in image data have become increasingly important in industrial quality inspection, improving product quality and operational efficiency. Most state-of-the-art Image Anomaly Detection (IAD) methods are based on unsupervised app...

Full description

Saved in:
Bibliographic Details
Published inSwiss Conference on Data Science (Online) pp. 40 - 47
Main Authors Wuest, Matthias, Huber, Lilach Goren
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.06.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:AI algorithms for the automatic detection of unusual or abnormal patterns in image data have become increasingly important in industrial quality inspection, improving product quality and operational efficiency. Most state-of-the-art Image Anomaly Detection (IAD) methods are based on unsupervised approaches, learning normal patterns from anomaly-free training data. However, in real-world applications the assumption of anomaly-free training data is often unrealistic, as labeling anomalies in the historical data can be expensive, error-prone, or even impossible. Anomalies contaminating the training data typically lead to a degraded anomaly detection (AD) performance at deployment, yet this issue remains largely overlooked in research. Some studies have attempted to mitigate this challenge through data refinement methods. However, these approaches often require prior knowledge of the anomaly ratio (AR) in the training data, which is rarely available in practice. In this paper, we introduce Overlapping Subsets Data Refinement (OSDR), a simple, fully unsupervised, and model-agnostic refinement framework designed to address image anomaly detection (IAD) under data contamination with no prior assumptions about the AR. OSDR assigns a refinement score to each training sample using an ensemble of models trained on partially overlapping data subsets, followed by robust anomaly removal through an adaptive thresholding technique. Evaluations on two widely used industrial image datasets demonstrate that OSDR effectively restores performance losses caused by contamination and outperforms existing refinement frameworks. Our approach provides a flexible, practical, and easy-to-deploy solution for IAD in real-world settings, where data contamination or mislabeling is often inevitable and the anomaly ratio is unknown.
ISSN:2835-3420
DOI:10.1109/SDS66131.2025.00013