Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring

Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IV...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on industrial informatics Vol. 20; no. 12; pp. 14114 - 14123
Main Authors Wang, Huan, Li, Chenxi, Li, Yan-Fu
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
AbstractList Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models.
Author Wang, Huan
Li, Yan-Fu
Li, Chenxi
Author_xml – sequence: 1
  givenname: Huan
  orcidid: 0000-0002-1403-5314
  surname: Wang
  fullname: Wang, Huan
  email: huan-wan21@mails.tsinghua.edu.cn
  organization: Department of Industrial Engineering, Tsinghua University, Beijing, China
– sequence: 2
  givenname: Chenxi
  orcidid: 0000-0002-1082-851X
  surname: Li
  fullname: Li, Chenxi
  organization: Glasgow College, University of Electronic Science and Technology of China, Chengdu, China
– sequence: 3
  givenname: Yan-Fu
  orcidid: 0000-0001-5755-7115
  surname: Li
  fullname: Li, Yan-Fu
  email: liyanfu@tsinghua.edu.cn
  organization: Department of Industrial Engineering, Tsinghua University, Beijing, China
BookMark eNp9kD1PwzAQhi1UJNrCzsBgiTnFH4nTjKV8VWrFQGGNHOcSuUrtYjtD_z2u0gExMN0r3T3vSc8EjYw1gNAtJTNKSfGwXa1mjLB0xtOUCj6_QGNapDQhJCOjmLOMJpwRfoUm3u8I4TnhxRiFtXQtJB9KdoC_tO9lh9fStL1sAW9sDR1-tNYHqHF1xEtrgpM-4Ce7l9rgRS0PQQZtDW6swysToOt0CybEXPc-OB37zrUba3SwTpv2Gl02svNwc55T9PnyvF2-Jev319VysU4UK1hIGq7yeVoTlRWEzyXNM1oUhajTXCiepTJTjFW8orXgcaOgIpwSYKyRwLOmVnyK7ofeg7PfPfhQ7mzvTHxZchot0ZzG5ikiw5Vy1nsHTXlwei_dsaSkPLkto9vy5LY8u42I-IMoPXiIenT3H3g3gBoAfv0RQsxTwX8AHruI7w
CODEN ITIICH
CitedBy_id crossref_primary_10_1016_j_eswa_2025_126996
Cites_doi 10.1007/s10845-019-01476-x
10.1109/TIE.2014.2328316
10.18653/v1/2022.acl-long.26
10.1109/TII.2021.3134251
10.1109/ICCV51070.2023.00513
10.1109/CVPR.2017.243
10.1109/TII.2021.3092372
10.1109/ICCV48922.2021.00986
10.1145/3065386
10.3390/s18010209
10.1109/ISIE45552.2021.9576231
10.1109/TII.2021.3139363
10.1109/TSM.2020.3020985
10.1007/s10845-021-01755-6
10.1109/CVPR.2016.90
10.1016/j.procir.2019.02.123
10.1145/3437963.3441659
10.1109/CVPR52729.2023.01926
10.1109/TIE.2014.2301773
10.1109/TII.2020.3000194
10.1109/CVPR52688.2022.01170
10.1109/TIE.2020.3013492
10.1109/ACCESS.2018.2807385
10.1109/CVPR52688.2022.01167
10.1115/MSEC2022-85670
10.1145/3474085.3475703
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TII.2024.3441638
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1941-0050
EndPage 14123
ExternalDocumentID 10_1109_TII_2024_3441638
10666846
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 71731008
  funderid: 10.13039/501100001809
– fundername: Beijing Municipal Natural Science Foundation-Rail Transit Joint Research Program
  grantid: L231020
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3
IEDL.DBID RIE
ISSN 1551-3203
IngestDate Mon Jun 30 10:11:50 EDT 2025
Tue Jul 01 03:00:31 EDT 2025
Thu Apr 24 23:09:52 EDT 2025
Wed Aug 27 01:57:01 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 12
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-1403-5314
0000-0002-1082-851X
0000-0001-5755-7115
PQID 3141617190
PQPubID 85507
PageCount 10
ParticipantIDs proquest_journals_3141617190
crossref_primary_10_1109_TII_2024_3441638
crossref_citationtrail_10_1109_TII_2024_3441638
ieee_primary_10666846
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-12-01
PublicationDateYYYYMMDD 2024-12-01
PublicationDate_xml – month: 12
  year: 2024
  text: 2024-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on industrial informatics
PublicationTitleAbbrev TII
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
Houlsby (ref32)
ref34
ref15
Gao (ref30) 2022; 35
ref37
ref14
ref36
ref31
ref11
ref10
Li (ref26) 2023
Zhu (ref28) 2023
ref2
ref1
Liu (ref27) 2024
ref17
Nafi (ref23) 2022; 12
ref19
ref18
Touvron (ref16) 2023
ref24
ref25
ref20
ref22
Dosovitskiy (ref12) 2021
ref21
Hu (ref33) 2021
ref29
ref8
ref7
ref4
ref3
ref6
Dettmers (ref35) 2023; 36
ref5
Simonyan (ref9) 2023
References_xml – ident: ref6
  doi: 10.1007/s10845-019-01476-x
– ident: ref2
  doi: 10.1109/TIE.2014.2328316
– ident: ref17
  doi: 10.18653/v1/2022.acl-long.26
– ident: ref3
  doi: 10.1109/TII.2021.3134251
– ident: ref29
  doi: 10.1109/ICCV51070.2023.00513
– ident: ref11
  doi: 10.1109/CVPR.2017.243
– ident: ref7
  doi: 10.1109/TII.2021.3092372
– start-page: 34892
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2024
  ident: ref27
  article-title: Visual instruction tuning
– ident: ref13
  doi: 10.1109/ICCV48922.2021.00986
– ident: ref8
  doi: 10.1145/3065386
– start-page: 2790
  volume-title: Proc. 36th Int. Conf. Mach. Learn.
  ident: ref32
  article-title: Parameter-efficient transfer learning for NLP
– year: 2021
  ident: ref12
  article-title: An image is worth 16x16 words: Transformers for image recognition at scale
– ident: ref22
  doi: 10.3390/s18010209
– ident: ref24
  doi: 10.1109/ISIE45552.2021.9576231
– year: 2023
  ident: ref28
  article-title: MiniGPT-4: Enhancing vision-language understanding with advanced large language models
– year: 2023
  ident: ref9
  article-title: Very deep convolutional networks for large-scale image recognition
– volume: 35
  start-page: 35959
  year: 2022
  ident: ref30
  article-title: PyramidCLIP: Hierarchical feature alignment for vision-language model pretraining
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref5
  doi: 10.1109/TII.2021.3139363
– ident: ref18
  doi: 10.1109/TSM.2020.3020985
– volume: 12
  start-page: 10
  year: 2022
  ident: ref23
  article-title: High accuracy swin transformers for image-based wafer map defect detection
  publication-title: Int. J. Eng. Manuf.
– ident: ref20
  doi: 10.1007/s10845-021-01755-6
– ident: ref10
  doi: 10.1109/CVPR.2016.90
– ident: ref19
  doi: 10.1016/j.procir.2019.02.123
– ident: ref14
  doi: 10.1145/3437963.3441659
– ident: ref34
  doi: 10.1109/CVPR52729.2023.01926
– ident: ref1
  doi: 10.1109/TIE.2014.2301773
– ident: ref4
  doi: 10.1109/TII.2020.3000194
– volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2023
  ident: ref26
  article-title: BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
– ident: ref36
  doi: 10.1109/CVPR52688.2022.01170
– year: 2023
  ident: ref16
  article-title: LLaMA: Open and efficient foundation language models
– ident: ref21
  doi: 10.1109/TIE.2020.3013492
– ident: ref15
  doi: 10.1109/ACCESS.2018.2807385
– ident: ref37
  doi: 10.1109/CVPR52688.2022.01167
– volume: 36
  start-page: 10088
  year: 2023
  ident: ref35
  article-title: QLORA: Efficient finetuning of quantized LLMs
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref25
  doi: 10.1115/MSEC2022-85670
– ident: ref31
  doi: 10.1145/3474085.3475703
– year: 2021
  ident: ref33
  article-title: LoRA: Low-rank adaptation of large language models
SSID ssj0037039
Score 2.4262683
Snippet Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 14114
SubjectTerms Adaptation
Adaptation models
Decoding
Defect detection
Feature extraction
Image contrast
Image enhancement
industrial visual monitoring (IVM)
large vision-language model (LVLM)
Monitoring
Natural language processing
Natural languages
Semiconductor device modeling
semiconductor manufacturing
Tuning
Vision
Visualization
Title Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring
URI https://ieeexplore.ieee.org/document/10666846
https://www.proquest.com/docview/3141617190
Volume 20
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG7Ekx58YsRXevDiobD0sWyP-CBgDBfBcNu03W5CRCCwHPTXO-3u4isab3tou02-aftNO_MNQpdhZNOAKUWooIZwFQgipQ4Ij9Km0oyn2vgo337YHfL7kRgVyeo-F8Za64PPbN19-rf8ZGZW7qoMVjiQbTgwK6gCnluerFVuuwxMV3pxVNEkjAasfJMMZGPQ64EnSHmdcc8_vpxBvqjKj53YHy-dXdQvJ5ZHlTzXV5mum7dvmo3_nvke2imIJm7nlrGPNuz0AG1_kh88RNmDCwMnjwCTxU_j5QraPxT3l9gVSZvg65lPAsH6FTsdq4VaZvh29qLGU9xO1Dx_x8dAfHFvre2Z4Y96IOWw-dbhfltFw87d4KZLiioMxFBJM5Iy04p4EhghAxYpoBtOuSBMeCs0THAlDKWa6WYSulp_xmonNmopTZVlIk0MO0Kb09nUHiMM9BIIVUuG1ne3irYS8G9UAiRVKMlqqFHiEptCotxVypjE3lUJZAxIxg7JuECyhq7WPea5PMcfbasOmE_tckxq6KzEPi4W8DJmTe_5AV06-aXbKdpyo-ehLWdoM1us7DkQlExfeMN8B6j34Ns
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGICBN6I8PbAwuKR-pPXIUy2ULhTEFtmOIyGgRTQd4NdzdhKeArFlsGNLn-37zr77DmAvbrks4lpTJpmlQkeSKmUiKlpZQxsuMmNDlG8vbl-L81t5Wyarh1wY51wIPnN1_xne8tOhHfurMtzhSLbRYE7CNBp-yYp0rerg5bh4VZBHlQ3KWcSrV8lIHfQ7HfQFmahzERjIFysUyqr8OIuDgTlbgF41tSKu5L4-zk3dvn5Tbfz33BdhvqSa5LBYG0sw4QbLMPdJgHAF8q4PBKdXCJQjN3ejMbbvljeYxJdJeyBHw5AGQswL8UpWz3qUk5Pho74bkMNUPxUv-QSpL-m8q3vm5KMiSPXb4vDww67C9dlp_7hNyzoM1DLFcppx22yJNLJSRbylkXB47YI4Fc3Ycim0tIwZbhpp7Kv9WWe83KhjLNOOyyy1fA2mBsOBWweCBBMpVVPFLnR3mjVT9HB0ijRVasVrcFDhkthSpNzXynhIgrMSqQSRTDySSYlkDfbfezwVAh1_tF31wHxqV2BSg60K-6TcwqOEN4Lvh4Rp45duuzDT7l92k26nd7EJs36kItBlC6by57HbRrqSm52wSN8Aj4jkJQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Visual+Language+Model+Boosted+by+Contrast+Domain+Adaptation+for+Intelligent+Industrial+Visual+Monitoring&rft.jtitle=IEEE+transactions+on+industrial+informatics&rft.au=Wang%2C+Huan&rft.au=Li%2C+Chenxi&rft.au=Yan-Fu%2C+Li&rft.date=2024-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1551-3203&rft.eissn=1941-0050&rft.volume=20&rft.issue=12&rft.spage=14114&rft_id=info:doi/10.1109%2FTII.2024.3441638&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1551-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1551-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1551-3203&client=summon