POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have p...

Full description

Saved in:
Bibliographic Details
Published inIEEE Pacific Visualization Symposium pp. 36 - 46
Main Authors He, Jianben, Wang, Xingbo, Liu, Shiyi, Wu, Guande, Silva, Claudio, Qu, Huamin
Format Conference Proceeding
LanguageEnglish
Published IEEE 22.04.2025
Subjects
Online AccessGet full text
ISSN2165-8773
DOI10.1109/PacificVis64226.2025.00010

Cover

Abstract Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities in multimodal inputs. This oversight hinders the development of effective prompts that guide models' multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for steering the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through quantitative and qualitative evaluations with experts.
AbstractList Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities in multimodal inputs. This oversight hinders the development of effective prompts that guide models' multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for steering the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through quantitative and qualitative evaluations with experts.
Author Wang, Xingbo
Liu, Shiyi
Silva, Claudio
Wu, Guande
Qu, Huamin
He, Jianben
Author_xml – sequence: 1
  givenname: Jianben
  surname: He
  fullname: He, Jianben
  email: jhebt@ust.hk
  organization: Hong Kong University of Science and Technology
– sequence: 2
  givenname: Xingbo
  surname: Wang
  fullname: Wang, Xingbo
  email: Xingbo.wang@us.bosch.com
  organization: Bosch Center for Artificial Intelligence (BCAI), Bosch Research North America
– sequence: 3
  givenname: Shiyi
  surname: Liu
  fullname: Liu, Shiyi
  email: shiyiliu@asu.edu
  organization: Arizona State University
– sequence: 4
  givenname: Guande
  surname: Wu
  fullname: Wu, Guande
  email: guandewu@nyu.edu
  organization: New York University
– sequence: 5
  givenname: Claudio
  surname: Silva
  fullname: Silva, Claudio
  email: csilva@nyu.edu
  organization: New York University
– sequence: 6
  givenname: Huamin
  surname: Qu
  fullname: Qu, Huamin
  email: huamin@ust.hk
  organization: Hong Kong University of Science and Technology
BookMark eNotUE1LAzEQjaJgrf0HHoL3rZNkk816k1K1sKVFitcypJMa2SZldyvorzeil5l5Hwy8d80uYorE2J2AqRBQ36_RBR_cW-hNKaWZSpB6CgACztikrmqrlNCgrVDnbCSF0YWtKnXFJn3_kW1Ql0JbM2K4Xs2XD3wRB-rQDeGT-LpLh-PAV8chHMI3DiFF7lPH5_Edowtxz5enNmtphy1_JexT_CWT5w12e8oz7k-Yj2XaUdvfsEuPbU-T_z1mm6f5ZvZSNKvnxeyxKUKthqK0FirrnCFQoEuytXGVgRwjR0HntQUELUAr7YQsKVPOmx0Il7H1Xo3Z7d_bQETbYxcO2H1tc1dSQKnUDx0QWkc
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PacificVis64226.2025.00010
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9798331505813
EISSN 2165-8773
EndPage 46
ExternalDocumentID 11021043
Genre orig-research
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i93t-488078cc6e03054e896c760833983acf580a0510535c124eacfcf6d01c5c18ff3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:43:03 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-488078cc6e03054e896c760833983acf580a0510535c124eacfcf6d01c5c18ff3
PageCount 11
ParticipantIDs ieee_primary_11021043
PublicationCentury 2000
PublicationDate 2025-April-22
PublicationDateYYYYMMDD 2025-04-22
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-22
  day: 22
PublicationDecade 2020
PublicationTitle IEEE Pacific Visualization Symposium
PublicationTitleAbbrev PACIFICVIS
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941586
Score 2.288908
Snippet Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot...
SourceID ieee
SourceType Publisher
StartPage 36
SubjectTerms Cognition
Context modeling
Interactive systems
Large language models
multimodal large language models
multimodal reasoning
Optimization
Prompt engineering
Refining
Usability
Visual analytics
Title POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
URI https://ieeexplore.ieee.org/document/11021043
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB3RnuDCVsQuH7imzeo6XFGrCtFFqKDeKse1oaJJKppe-HpmnLRUSEhcosRSIsvOeJ6f580A3Kkw8RVPlEOqS6JufCfhJnBE4gsj0aFLW3muP-C9l_BxEk0qsbrVwmitbfCZbtKtPcuf5WpNVFnLozrUbhjUoIb_WSnW2hIquE_xIsGrxKKeG7eqsLbX-YqTXhQ3gz4RKC6pZXdKqVhP0j2EwaYPZQDJR3NdJE319Ss94787eQSNH9EeG23d0THs6ewEDnbyDZ6CHA07_XtmWUBpFzp6I10WbIgrR1pJMhniWNbJ3ikTR_bGrEQ3zWdywZ61XFn6luWGPVEMOV5LvpNRUbXFqgHjbmf80HOqGgvOPA4Kh8y3LZTimgw_1CLmqs0RlgWxCKQykXAlmW0URAqRAK7SRhk-cz2Fz8KY4AzqWZ7pc2CILBDrKGHwW7gpRTfHlStjdH6xQYzpXUCDBmu6LLNoTDfjdPlH-xXs04TRyY3vX0O9-FzrGwQARXJrJ_4b4V2vBA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHtSLL4xv9-C12Oey9WogqLxi0HAj22VXidASKRd_vTNLQWJi4qVpm7TZbDsz33473wzAjQoTX_FEOaS6JOrGdxJuAkckvjASA7q0nedabd54CR_7Ub8Qq1stjNbaJp_pCp3avfxhpuZEld161IfaDYNN2MLAH0YLudaKUsGVihcJXpQW9dz4tkhsex3NOClGcTnoE4Xikl52rZmKjSX1PWgvR7FIIfmozPOkor5-FWj89zD3ofwj22PdVUA6gA2dHsLuWsXBI5DdTq11xywPKK2roycm05x10HdMClEmQyTLauk71eJI35gV6U6yoRyzZy1nlsBlmWFNyiLH44LxZNRWbTwrQ69e6903nKLLgjOKg9whA64Kpbgm0w-1iLmqcgRmQSwCqUwkXEmGGwWRQiyAftoow4eup_BaGBMcQynNUn0CDLEFoh0lDL4Ll6UY6LhyZYzhLzaIMr1TKNNkDaaLOhqD5Tyd_XH_GrYbvVZz0HxoP53DDn082sfx_Qso5Z9zfYlwIE-u7E_wDTrDslE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+Pacific+Visualization+Symposium&rft.atitle=POEM%3A+Interactive+Prompt+Optimization+for+Enhancing+Multimodal+Reasoning+of+Large+Language+Models&rft.au=He%2C+Jianben&rft.au=Wang%2C+Xingbo&rft.au=Liu%2C+Shiyi&rft.au=Wu%2C+Guande&rft.date=2025-04-22&rft.pub=IEEE&rft.eissn=2165-8773&rft.spage=36&rft.epage=46&rft_id=info:doi/10.1109%2FPacificVis64226.2025.00010&rft.externalDocID=11021043