Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring

Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IV...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on industrial informatics Vol. 20; no. 12; pp. 14114 - 14123
Main Authors	Wang, Huan, Li, Chenxi, Li, Yan-Fu
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation Adaptation models Decoding Defect detection Feature extraction Image contrast Image enhancement industrial visual monitoring (IVM) large vision-language model (LVLM) Monitoring Natural language processing Natural languages Semiconductor device modeling semiconductor manufacturing Tuning Vision Visualization
Online Access	Get full text

Cover

Loading…

Abstract	Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
AbstractList	Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
Author	Wang, Huan Li, Yan-Fu Li, Chenxi
Author_xml	– sequence: 1 givenname: Huan orcidid: 0000-0002-1403-5314 surname: Wang fullname: Wang, Huan email: huan-wan21@mails.tsinghua.edu.cn organization: Department of Industrial Engineering, Tsinghua University, Beijing, China – sequence: 2 givenname: Chenxi orcidid: 0000-0002-1082-851X surname: Li fullname: Li, Chenxi organization: Glasgow College, University of Electronic Science and Technology of China, Chengdu, China – sequence: 3 givenname: Yan-Fu orcidid: 0000-0001-5755-7115 surname: Li fullname: Li, Yan-Fu email: liyanfu@tsinghua.edu.cn organization: Department of Industrial Engineering, Tsinghua University, Beijing, China
BookMark	eNp9kD1PwzAQhi1UJNrCzsBgiTnFH4nTjKV8VWrFQGGNHOcSuUrtYjtD_z2u0gExMN0r3T3vSc8EjYw1gNAtJTNKSfGwXa1mjLB0xtOUCj6_QGNapDQhJCOjmLOMJpwRfoUm3u8I4TnhxRiFtXQtJB9KdoC_tO9lh9fStL1sAW9sDR1-tNYHqHF1xEtrgpM-4Ce7l9rgRS0PQQZtDW6swysToOt0CybEXPc-OB37zrUba3SwTpv2Gl02svNwc55T9PnyvF2-Jev319VysU4UK1hIGq7yeVoTlRWEzyXNM1oUhajTXCiepTJTjFW8orXgcaOgIpwSYKyRwLOmVnyK7ofeg7PfPfhQ7mzvTHxZchot0ZzG5ikiw5Vy1nsHTXlwei_dsaSkPLkto9vy5LY8u42I-IMoPXiIenT3H3g3gBoAfv0RQsxTwX8AHruI7w
CODEN	ITIICH
CitedBy_id	crossref_primary_10_1016_j_eswa_2025_126996
Cites_doi	10.1007/s10845-019-01476-x 10.1109/TIE.2014.2328316 10.18653/v1/2022.acl-long.26 10.1109/TII.2021.3134251 10.1109/ICCV51070.2023.00513 10.1109/CVPR.2017.243 10.1109/TII.2021.3092372 10.1109/ICCV48922.2021.00986 10.1145/3065386 10.3390/s18010209 10.1109/ISIE45552.2021.9576231 10.1109/TII.2021.3139363 10.1109/TSM.2020.3020985 10.1007/s10845-021-01755-6 10.1109/CVPR.2016.90 10.1016/j.procir.2019.02.123 10.1145/3437963.3441659 10.1109/CVPR52729.2023.01926 10.1109/TIE.2014.2301773 10.1109/TII.2020.3000194 10.1109/CVPR52688.2022.01170 10.1109/TIE.2020.3013492 10.1109/ACCESS.2018.2807385 10.1109/CVPR52688.2022.01167 10.1115/MSEC2022-85670 10.1145/3474085.3475703
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TII.2024.3441638
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1941-0050
EndPage	14123
ExternalDocumentID	10_1109_TII_2024_3441638 10666846
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 71731008 funderid: 10.13039/501100001809 – fundername: Beijing Municipal Natural Science Foundation-Rail Transit Joint Research Program grantid: L231020
GroupedDBID	0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3
IEDL.DBID	RIE
ISSN	1551-3203
IngestDate	Mon Jun 30 10:11:50 EDT 2025 Tue Jul 01 03:00:31 EDT 2025 Thu Apr 24 23:09:52 EDT 2025 Wed Aug 27 01:57:01 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Issue	12
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-1403-5314 0000-0002-1082-851X 0000-0001-5755-7115
PQID	3141617190
PQPubID	85507
PageCount	10
ParticipantIDs	proquest_journals_3141617190 crossref_primary_10_1109_TII_2024_3441638 crossref_citationtrail_10_1109_TII_2024_3441638 ieee_primary_10666846
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-12-01
PublicationDateYYYYMMDD	2024-12-01
PublicationDate_xml	– month: 12 year: 2024 text: 2024-12-01 day: 01
PublicationDecade	2020
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE transactions on industrial informatics
PublicationTitleAbbrev	TII
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 Houlsby (ref32) ref34 ref15 Gao (ref30) 2022; 35 ref37 ref14 ref36 ref31 ref11 ref10 Li (ref26) 2023 Zhu (ref28) 2023 ref2 ref1 Liu (ref27) 2024 ref17 Nafi (ref23) 2022; 12 ref19 ref18 Touvron (ref16) 2023 ref24 ref25 ref20 ref22 Dosovitskiy (ref12) 2021 ref21 Hu (ref33) 2021 ref29 ref8 ref7 ref4 ref3 ref6 Dettmers (ref35) 2023; 36 ref5 Simonyan (ref9) 2023
References_xml	– ident: ref6 doi: 10.1007/s10845-019-01476-x – ident: ref2 doi: 10.1109/TIE.2014.2328316 – ident: ref17 doi: 10.18653/v1/2022.acl-long.26 – ident: ref3 doi: 10.1109/TII.2021.3134251 – ident: ref29 doi: 10.1109/ICCV51070.2023.00513 – ident: ref11 doi: 10.1109/CVPR.2017.243 – ident: ref7 doi: 10.1109/TII.2021.3092372 – start-page: 34892 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2024 ident: ref27 article-title: Visual instruction tuning – ident: ref13 doi: 10.1109/ICCV48922.2021.00986 – ident: ref8 doi: 10.1145/3065386 – start-page: 2790 volume-title: Proc. 36th Int. Conf. Mach. Learn. ident: ref32 article-title: Parameter-efficient transfer learning for NLP – year: 2021 ident: ref12 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref22 doi: 10.3390/s18010209 – ident: ref24 doi: 10.1109/ISIE45552.2021.9576231 – year: 2023 ident: ref28 article-title: MiniGPT-4: Enhancing vision-language understanding with advanced large language models – year: 2023 ident: ref9 article-title: Very deep convolutional networks for large-scale image recognition – volume: 35 start-page: 35959 year: 2022 ident: ref30 article-title: PyramidCLIP: Hierarchical feature alignment for vision-language model pretraining publication-title: Adv. Neural Inf. Process. Syst. – ident: ref5 doi: 10.1109/TII.2021.3139363 – ident: ref18 doi: 10.1109/TSM.2020.3020985 – volume: 12 start-page: 10 year: 2022 ident: ref23 article-title: High accuracy swin transformers for image-based wafer map defect detection publication-title: Int. J. Eng. Manuf. – ident: ref20 doi: 10.1007/s10845-021-01755-6 – ident: ref10 doi: 10.1109/CVPR.2016.90 – ident: ref19 doi: 10.1016/j.procir.2019.02.123 – ident: ref14 doi: 10.1145/3437963.3441659 – ident: ref34 doi: 10.1109/CVPR52729.2023.01926 – ident: ref1 doi: 10.1109/TIE.2014.2301773 – ident: ref4 doi: 10.1109/TII.2020.3000194 – volume-title: Proc. Int. Conf. Mach. Learn. year: 2023 ident: ref26 article-title: BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models – ident: ref36 doi: 10.1109/CVPR52688.2022.01170 – year: 2023 ident: ref16 article-title: LLaMA: Open and efficient foundation language models – ident: ref21 doi: 10.1109/TIE.2020.3013492 – ident: ref15 doi: 10.1109/ACCESS.2018.2807385 – ident: ref37 doi: 10.1109/CVPR52688.2022.01167 – volume: 36 start-page: 10088 year: 2023 ident: ref35 article-title: QLORA: Efficient finetuning of quantized LLMs publication-title: Adv. Neural Inf. Process. Syst. – ident: ref25 doi: 10.1115/MSEC2022-85670 – ident: ref31 doi: 10.1145/3474085.3475703 – year: 2021 ident: ref33 article-title: LoRA: Low-rank adaptation of large language models
SSID	ssj0037039
Score	2.4262683
Snippet	Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	14114
SubjectTerms	Adaptation Adaptation models Decoding Defect detection Feature extraction Image contrast Image enhancement industrial visual monitoring (IVM) large vision-language model (LVLM) Monitoring Natural language processing Natural languages Semiconductor device modeling semiconductor manufacturing Tuning Vision Visualization
Title	Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring
URI	https://ieeexplore.ieee.org/document/10666846 https://www.proquest.com/docview/3141617190
Volume	20
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG7Ekx58YsRXevDiobD0sWyP-CBgDBfBcNu03W5CRCCwHPTXO-3u4isab3tou02-aftNO_MNQpdhZNOAKUWooIZwFQgipQ4Ij9Km0oyn2vgo337YHfL7kRgVyeo-F8Za64PPbN19-rf8ZGZW7qoMVjiQbTgwK6gCnluerFVuuwxMV3pxVNEkjAasfJMMZGPQ64EnSHmdcc8_vpxBvqjKj53YHy-dXdQvJ5ZHlTzXV5mum7dvmo3_nvke2imIJm7nlrGPNuz0AG1_kh88RNmDCwMnjwCTxU_j5QraPxT3l9gVSZvg65lPAsH6FTsdq4VaZvh29qLGU9xO1Dx_x8dAfHFvre2Z4Y96IOWw-dbhfltFw87d4KZLiioMxFBJM5Iy04p4EhghAxYpoBtOuSBMeCs0THAlDKWa6WYSulp_xmonNmopTZVlIk0MO0Kb09nUHiMM9BIIVUuG1ne3irYS8G9UAiRVKMlqqFHiEptCotxVypjE3lUJZAxIxg7JuECyhq7WPea5PMcfbasOmE_tckxq6KzEPi4W8DJmTe_5AV06-aXbKdpyo-ehLWdoM1us7DkQlExfeMN8B6j34Ns
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGICBN6I8PbAwuKR-pPXIUy2ULhTEFtmOIyGgRTQd4NdzdhKeArFlsGNLn-37zr77DmAvbrks4lpTJpmlQkeSKmUiKlpZQxsuMmNDlG8vbl-L81t5Wyarh1wY51wIPnN1_xne8tOhHfurMtzhSLbRYE7CNBp-yYp0rerg5bh4VZBHlQ3KWcSrV8lIHfQ7HfQFmahzERjIFysUyqr8OIuDgTlbgF41tSKu5L4-zk3dvn5Tbfz33BdhvqSa5LBYG0sw4QbLMPdJgHAF8q4PBKdXCJQjN3ejMbbvljeYxJdJeyBHw5AGQswL8UpWz3qUk5Pho74bkMNUPxUv-QSpL-m8q3vm5KMiSPXb4vDww67C9dlp_7hNyzoM1DLFcppx22yJNLJSRbylkXB47YI4Fc3Ycim0tIwZbhpp7Kv9WWe83KhjLNOOyyy1fA2mBsOBWweCBBMpVVPFLnR3mjVT9HB0ijRVasVrcFDhkthSpNzXynhIgrMSqQSRTDySSYlkDfbfezwVAh1_tF31wHxqV2BSg60K-6TcwqOEN4Lvh4Rp45duuzDT7l92k26nd7EJs36kItBlC6by57HbRrqSm52wSN8Aj4jkJQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Visual+Language+Model+Boosted+by+Contrast+Domain+Adaptation+for+Intelligent+Industrial+Visual+Monitoring&rft.jtitle=IEEE+transactions+on+industrial+informatics&rft.au=Wang%2C+Huan&rft.au=Li%2C+Chenxi&rft.au=Yan-Fu%2C+Li&rft.date=2024-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1551-3203&rft.eissn=1941-0050&rft.volume=20&rft.issue=12&rft.spage=14114&rft_id=info:doi/10.1109%2FTII.2024.3441638&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1551-3203&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1551-3203&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1551-3203&client=summon