A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) an...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 25; no. 15; p. 4656
Main Authors	Shi, Chenbo, Yan, Shaojia, Wang, Lei, Zhu, Changsheng, Yu, Yue, Zang, Xiangteng, Liu, Aiping, Zhang, Chun, Feng, Xiaobing
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 27.07.2025 MDPI
Subjects	Accuracy Cameras Decision making Deep learning Defects image interference key interference-free frames Methods Morphology multi-granularity spatiotemporal features porosity defect Vision systems key interference-free frames deep learning porosity defect image interference multi-granularity spatiotemporal features
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s25154656