Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring

Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IV...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on industrial informatics Vol. 20; no. 12; pp. 14114 - 14123
Main Authors Wang, Huan, Li, Chenxi, Li, Yan-Fu
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1551-3203
1941-0050
DOI:10.1109/TII.2024.3441638