Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring
Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IV...
Saved in:
Published in | IEEE transactions on industrial informatics Vol. 20; no. 12; pp. 14114 - 14123 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models. |
---|---|
AbstractList | Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models (LVLMs) have demonstrated remarkable semantic understanding and natural language interaction capabilities, which provide a novel solution to IVM. However, LVLMs pretrained on common domains lack specific knowledge for IVM scenarios, causing insufficient adaptation to industrial image patterns and specialized textual corpora. In this article, we deeply studied the adaptation of LVLMs to IVM and proposed DefectGLM. First, we proposed the first large-scale multimodal wafer dataset as a reliable data basis for model domain generalization. Second, this model employs low-rank adaptation-based contrast visual adaptation to align with industrial image patterns and utilizes vision-language instruction tuning for professional knowledge alignment. DefectGLM is the first large-model-based wafer image recognition model, and can accurately identify 36 types of wafer defects and provide appropriate text descriptions. DefectGLM provides a new solution for the development of industrial large models. |
Author | Wang, Huan Li, Yan-Fu Li, Chenxi |
Author_xml | – sequence: 1 givenname: Huan orcidid: 0000-0002-1403-5314 surname: Wang fullname: Wang, Huan email: huan-wan21@mails.tsinghua.edu.cn organization: Department of Industrial Engineering, Tsinghua University, Beijing, China – sequence: 2 givenname: Chenxi orcidid: 0000-0002-1082-851X surname: Li fullname: Li, Chenxi organization: Glasgow College, University of Electronic Science and Technology of China, Chengdu, China – sequence: 3 givenname: Yan-Fu orcidid: 0000-0001-5755-7115 surname: Li fullname: Li, Yan-Fu email: liyanfu@tsinghua.edu.cn organization: Department of Industrial Engineering, Tsinghua University, Beijing, China |
BookMark | eNp9kD1PwzAQhi1UJNrCzsBgiTnFH4nTjKV8VWrFQGGNHOcSuUrtYjtD_z2u0gExMN0r3T3vSc8EjYw1gNAtJTNKSfGwXa1mjLB0xtOUCj6_QGNapDQhJCOjmLOMJpwRfoUm3u8I4TnhxRiFtXQtJB9KdoC_tO9lh9fStL1sAW9sDR1-tNYHqHF1xEtrgpM-4Ce7l9rgRS0PQQZtDW6swysToOt0CybEXPc-OB37zrUba3SwTpv2Gl02svNwc55T9PnyvF2-Jev319VysU4UK1hIGq7yeVoTlRWEzyXNM1oUhajTXCiepTJTjFW8orXgcaOgIpwSYKyRwLOmVnyK7ofeg7PfPfhQ7mzvTHxZchot0ZzG5ikiw5Vy1nsHTXlwei_dsaSkPLkto9vy5LY8u42I-IMoPXiIenT3H3g3gBoAfv0RQsxTwX8AHruI7w |
CODEN | ITIICH |
CitedBy_id | crossref_primary_10_1016_j_eswa_2025_126996 |
Cites_doi | 10.1007/s10845-019-01476-x 10.1109/TIE.2014.2328316 10.18653/v1/2022.acl-long.26 10.1109/TII.2021.3134251 10.1109/ICCV51070.2023.00513 10.1109/CVPR.2017.243 10.1109/TII.2021.3092372 10.1109/ICCV48922.2021.00986 10.1145/3065386 10.3390/s18010209 10.1109/ISIE45552.2021.9576231 10.1109/TII.2021.3139363 10.1109/TSM.2020.3020985 10.1007/s10845-021-01755-6 10.1109/CVPR.2016.90 10.1016/j.procir.2019.02.123 10.1145/3437963.3441659 10.1109/CVPR52729.2023.01926 10.1109/TIE.2014.2301773 10.1109/TII.2020.3000194 10.1109/CVPR52688.2022.01170 10.1109/TIE.2020.3013492 10.1109/ACCESS.2018.2807385 10.1109/CVPR52688.2022.01167 10.1115/MSEC2022-85670 10.1145/3474085.3475703 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TII.2024.3441638 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1941-0050 |
EndPage | 14123 |
ExternalDocumentID | 10_1109_TII_2024_3441638 10666846 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 71731008 funderid: 10.13039/501100001809 – fundername: Beijing Municipal Natural Science Foundation-Rail Transit Joint Research Program grantid: L231020 |
GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3 |
IEDL.DBID | RIE |
ISSN | 1551-3203 |
IngestDate | Mon Jun 30 10:11:50 EDT 2025 Tue Jul 01 03:00:31 EDT 2025 Thu Apr 24 23:09:52 EDT 2025 Wed Aug 27 01:57:01 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 12 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c292t-f3c784d0c59038a17519996d476c354a5c22b3b1d63519ceb0310e22fae35fdc3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-1403-5314 0000-0002-1082-851X 0000-0001-5755-7115 |
PQID | 3141617190 |
PQPubID | 85507 |
PageCount | 10 |
ParticipantIDs | proquest_journals_3141617190 crossref_primary_10_1109_TII_2024_3441638 crossref_citationtrail_10_1109_TII_2024_3441638 ieee_primary_10666846 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-12-01 |
PublicationDateYYYYMMDD | 2024-12-01 |
PublicationDate_xml | – month: 12 year: 2024 text: 2024-12-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE transactions on industrial informatics |
PublicationTitleAbbrev | TII |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 Houlsby (ref32) ref34 ref15 Gao (ref30) 2022; 35 ref37 ref14 ref36 ref31 ref11 ref10 Li (ref26) 2023 Zhu (ref28) 2023 ref2 ref1 Liu (ref27) 2024 ref17 Nafi (ref23) 2022; 12 ref19 ref18 Touvron (ref16) 2023 ref24 ref25 ref20 ref22 Dosovitskiy (ref12) 2021 ref21 Hu (ref33) 2021 ref29 ref8 ref7 ref4 ref3 ref6 Dettmers (ref35) 2023; 36 ref5 Simonyan (ref9) 2023 |
References_xml | – ident: ref6 doi: 10.1007/s10845-019-01476-x – ident: ref2 doi: 10.1109/TIE.2014.2328316 – ident: ref17 doi: 10.18653/v1/2022.acl-long.26 – ident: ref3 doi: 10.1109/TII.2021.3134251 – ident: ref29 doi: 10.1109/ICCV51070.2023.00513 – ident: ref11 doi: 10.1109/CVPR.2017.243 – ident: ref7 doi: 10.1109/TII.2021.3092372 – start-page: 34892 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2024 ident: ref27 article-title: Visual instruction tuning – ident: ref13 doi: 10.1109/ICCV48922.2021.00986 – ident: ref8 doi: 10.1145/3065386 – start-page: 2790 volume-title: Proc. 36th Int. Conf. Mach. Learn. ident: ref32 article-title: Parameter-efficient transfer learning for NLP – year: 2021 ident: ref12 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref22 doi: 10.3390/s18010209 – ident: ref24 doi: 10.1109/ISIE45552.2021.9576231 – year: 2023 ident: ref28 article-title: MiniGPT-4: Enhancing vision-language understanding with advanced large language models – year: 2023 ident: ref9 article-title: Very deep convolutional networks for large-scale image recognition – volume: 35 start-page: 35959 year: 2022 ident: ref30 article-title: PyramidCLIP: Hierarchical feature alignment for vision-language model pretraining publication-title: Adv. Neural Inf. Process. Syst. – ident: ref5 doi: 10.1109/TII.2021.3139363 – ident: ref18 doi: 10.1109/TSM.2020.3020985 – volume: 12 start-page: 10 year: 2022 ident: ref23 article-title: High accuracy swin transformers for image-based wafer map defect detection publication-title: Int. J. Eng. Manuf. – ident: ref20 doi: 10.1007/s10845-021-01755-6 – ident: ref10 doi: 10.1109/CVPR.2016.90 – ident: ref19 doi: 10.1016/j.procir.2019.02.123 – ident: ref14 doi: 10.1145/3437963.3441659 – ident: ref34 doi: 10.1109/CVPR52729.2023.01926 – ident: ref1 doi: 10.1109/TIE.2014.2301773 – ident: ref4 doi: 10.1109/TII.2020.3000194 – volume-title: Proc. Int. Conf. Mach. Learn. year: 2023 ident: ref26 article-title: BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models – ident: ref36 doi: 10.1109/CVPR52688.2022.01170 – year: 2023 ident: ref16 article-title: LLaMA: Open and efficient foundation language models – ident: ref21 doi: 10.1109/TIE.2020.3013492 – ident: ref15 doi: 10.1109/ACCESS.2018.2807385 – ident: ref37 doi: 10.1109/CVPR52688.2022.01167 – volume: 36 start-page: 10088 year: 2023 ident: ref35 article-title: QLORA: Efficient finetuning of quantized LLMs publication-title: Adv. Neural Inf. Process. Syst. – ident: ref25 doi: 10.1115/MSEC2022-85670 – ident: ref31 doi: 10.1145/3474085.3475703 – year: 2021 ident: ref33 article-title: LoRA: Low-rank adaptation of large language models |
SSID | ssj0037039 |
Score | 2.4262683 |
Snippet | Industrial visual monitoring (IVM) is crucial in enhancing the reliability and efficiency of manufacturing processes. Recently, large vision-language models... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 14114 |
SubjectTerms | Adaptation Adaptation models Decoding Defect detection Feature extraction Image contrast Image enhancement industrial visual monitoring (IVM) large vision-language model (LVLM) Monitoring Natural language processing Natural languages Semiconductor device modeling semiconductor manufacturing Tuning Vision Visualization |
Title | Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring |
URI | https://ieeexplore.ieee.org/document/10666846 https://www.proquest.com/docview/3141617190 |
Volume | 20 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG7Ekx58YsRXevDiobD0sWyP-CBgDBfBcNu03W5CRCCwHPTXO-3u4isab3tou02-aftNO_MNQpdhZNOAKUWooIZwFQgipQ4Ij9Km0oyn2vgo337YHfL7kRgVyeo-F8Za64PPbN19-rf8ZGZW7qoMVjiQbTgwK6gCnluerFVuuwxMV3pxVNEkjAasfJMMZGPQ64EnSHmdcc8_vpxBvqjKj53YHy-dXdQvJ5ZHlTzXV5mum7dvmo3_nvke2imIJm7nlrGPNuz0AG1_kh88RNmDCwMnjwCTxU_j5QraPxT3l9gVSZvg65lPAsH6FTsdq4VaZvh29qLGU9xO1Dx_x8dAfHFvre2Z4Y96IOWw-dbhfltFw87d4KZLiioMxFBJM5Iy04p4EhghAxYpoBtOuSBMeCs0THAlDKWa6WYSulp_xmonNmopTZVlIk0MO0Kb09nUHiMM9BIIVUuG1ne3irYS8G9UAiRVKMlqqFHiEptCotxVypjE3lUJZAxIxg7JuECyhq7WPea5PMcfbasOmE_tckxq6KzEPi4W8DJmTe_5AV06-aXbKdpyo-ehLWdoM1us7DkQlExfeMN8B6j34Ns |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGICBN6I8PbAwuKR-pPXIUy2ULhTEFtmOIyGgRTQd4NdzdhKeArFlsGNLn-37zr77DmAvbrks4lpTJpmlQkeSKmUiKlpZQxsuMmNDlG8vbl-L81t5Wyarh1wY51wIPnN1_xne8tOhHfurMtzhSLbRYE7CNBp-yYp0rerg5bh4VZBHlQ3KWcSrV8lIHfQ7HfQFmahzERjIFysUyqr8OIuDgTlbgF41tSKu5L4-zk3dvn5Tbfz33BdhvqSa5LBYG0sw4QbLMPdJgHAF8q4PBKdXCJQjN3ejMbbvljeYxJdJeyBHw5AGQswL8UpWz3qUk5Pho74bkMNUPxUv-QSpL-m8q3vm5KMiSPXb4vDww67C9dlp_7hNyzoM1DLFcppx22yJNLJSRbylkXB47YI4Fc3Ycim0tIwZbhpp7Kv9WWe83KhjLNOOyyy1fA2mBsOBWweCBBMpVVPFLnR3mjVT9HB0ijRVasVrcFDhkthSpNzXynhIgrMSqQSRTDySSYlkDfbfezwVAh1_tF31wHxqV2BSg60K-6TcwqOEN4Lvh4Rp45duuzDT7l92k26nd7EJs36kItBlC6by57HbRrqSm52wSN8Aj4jkJQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Visual+Language+Model+Boosted+by+Contrast+Domain+Adaptation+for+Intelligent+Industrial+Visual+Monitoring&rft.jtitle=IEEE+transactions+on+industrial+informatics&rft.au=Wang%2C+Huan&rft.au=Li%2C+Chenxi&rft.au=Yan-Fu%2C+Li&rft.date=2024-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1551-3203&rft.eissn=1941-0050&rft.volume=20&rft.issue=12&rft.spage=14114&rft_id=info:doi/10.1109%2FTII.2024.3441638&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1551-3203&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1551-3203&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1551-3203&client=summon |