Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring

Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IV...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on industrial informatics Vol. 20; no. 12; pp. 14114 - 14123
Main Authors	Wang, Huan, Li, Chenxi, Li, Yan-Fu
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation Adaptation models Decoding Defect detection Feature extraction Image contrast Image enhancement industrial visual monitoring (IVM) large vision-language model (LVLM) Monitoring Natural language processing Natural languages Semiconductor device modeling semiconductor manufacturing Tuning Vision Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1551-3203 1941-0050
DOI:	10.1109/TII.2024.3441638