POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have p...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Pacific Visualization Symposium pp. 36 - 46
Main Authors	He, Jianben, Wang, Xingbo, Liu, Shiyi, Wu, Guande, Silva, Claudio, Qu, Huamin
Format	Conference Proceeding
Language	English
Published	IEEE 22.04.2025
Subjects	Cognition Context modeling Interactive systems Large language models multimodal large language models multimodal reasoning Optimization Prompt engineering Refining Usability Visual analytics
Online Access	Get full text
ISSN	2165-8773
DOI	10.1109/PacificVis64226.2025.00010

Cover

Abstract	Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities in multimodal inputs. This oversight hinders the development of effective prompts that guide models' multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for steering the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through quantitative and qualitative evaluations with experts.
AbstractList	Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities in multimodal inputs. This oversight hinders the development of effective prompts that guide models' multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for steering the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through quantitative and qualitative evaluations with experts.
Author	Wang, Xingbo Liu, Shiyi Silva, Claudio Wu, Guande Qu, Huamin He, Jianben
Author_xml	– sequence: 1 givenname: Jianben surname: He fullname: He, Jianben email: jhebt@ust.hk organization: Hong Kong University of Science and Technology – sequence: 2 givenname: Xingbo surname: Wang fullname: Wang, Xingbo email: Xingbo.wang@us.bosch.com organization: Bosch Center for Artificial Intelligence (BCAI), Bosch Research North America – sequence: 3 givenname: Shiyi surname: Liu fullname: Liu, Shiyi email: shiyiliu@asu.edu organization: Arizona State University – sequence: 4 givenname: Guande surname: Wu fullname: Wu, Guande email: guandewu@nyu.edu organization: New York University – sequence: 5 givenname: Claudio surname: Silva fullname: Silva, Claudio email: csilva@nyu.edu organization: New York University – sequence: 6 givenname: Huamin surname: Qu fullname: Qu, Huamin email: huamin@ust.hk organization: Hong Kong University of Science and Technology
BookMark	eNotUE1LAzEQjaJgrf0HHoL3rZNkk816k1K1sKVFitcypJMa2SZldyvorzeil5l5Hwy8d80uYorE2J2AqRBQ36_RBR_cW-hNKaWZSpB6CgACztikrmqrlNCgrVDnbCSF0YWtKnXFJn3_kW1Ql0JbM2K4Xs2XD3wRB-rQDeGT-LpLh-PAV8chHMI3DiFF7lPH5_Edowtxz5enNmtphy1_JexT_CWT5w12e8oz7k-Yj2XaUdvfsEuPbU-T_z1mm6f5ZvZSNKvnxeyxKUKthqK0FirrnCFQoEuytXGVgRwjR0HntQUELUAr7YQsKVPOmx0Il7H1Xo3Z7d_bQETbYxcO2H1tc1dSQKnUDx0QWkc
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/PacificVis64226.2025.00010
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9798331505813
EISSN	2165-8773
EndPage	46
ExternalDocumentID	11021043
Genre	orig-research
GroupedDBID	6IE 6IL 6IN AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK M43 OCL RIE RIL
ID	FETCH-LOGICAL-i93t-488078cc6e03054e896c760833983acf580a0510535c124eacfcf6d01c5c18ff3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 01:43:03 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i93t-488078cc6e03054e896c760833983acf580a0510535c124eacfcf6d01c5c18ff3
PageCount	11
ParticipantIDs	ieee_primary_11021043
PublicationCentury	2000
PublicationDate	2025-April-22
PublicationDateYYYYMMDD	2025-04-22
PublicationDate_xml	– month: 04 year: 2025 text: 2025-April-22 day: 22
PublicationDecade	2020
PublicationTitle	IEEE Pacific Visualization Symposium
PublicationTitleAbbrev	PACIFICVIS
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0000941586
Score	2.288908
Snippet	Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot...
SourceID	ieee
SourceType	Publisher
StartPage	36
SubjectTerms	Cognition Context modeling Interactive systems Large language models multimodal large language models multimodal reasoning Optimization Prompt engineering Refining Usability Visual analytics
Title	POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
URI	https://ieeexplore.ieee.org/document/11021043
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB3RnuDCVsQuH7imzeo6XFGrCtFFqKDeKse1oaJJKppe-HpmnLRUSEhcosRSIsvOeJ6f580A3Kkw8RVPlEOqS6JufCfhJnBE4gsj0aFLW3muP-C9l_BxEk0qsbrVwmitbfCZbtKtPcuf5WpNVFnLozrUbhjUoIb_WSnW2hIquE_xIsGrxKKeG7eqsLbX-YqTXhQ3gz4RKC6pZXdKqVhP0j2EwaYPZQDJR3NdJE319Ss94787eQSNH9EeG23d0THs6ewEDnbyDZ6CHA07_XtmWUBpFzp6I10WbIgrR1pJMhniWNbJ3ikTR_bGrEQ3zWdywZ61XFn6luWGPVEMOV5LvpNRUbXFqgHjbmf80HOqGgvOPA4Kh8y3LZTimgw_1CLmqs0RlgWxCKQykXAlmW0URAqRAK7SRhk-cz2Fz8KY4AzqWZ7pc2CILBDrKGHwW7gpRTfHlStjdH6xQYzpXUCDBmu6LLNoTDfjdPlH-xXs04TRyY3vX0O9-FzrGwQARXJrJ_4b4V2vBA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHtSLL4xv9-C12Oey9WogqLxi0HAj22VXidASKRd_vTNLQWJi4qVpm7TZbDsz33473wzAjQoTX_FEOaS6JOrGdxJuAkckvjASA7q0nedabd54CR_7Ub8Qq1stjNbaJp_pCp3avfxhpuZEld161IfaDYNN2MLAH0YLudaKUsGVihcJXpQW9dz4tkhsex3NOClGcTnoE4Xikl52rZmKjSX1PWgvR7FIIfmozPOkor5-FWj89zD3ofwj22PdVUA6gA2dHsLuWsXBI5DdTq11xywPKK2roycm05x10HdMClEmQyTLauk71eJI35gV6U6yoRyzZy1nlsBlmWFNyiLH44LxZNRWbTwrQ69e6903nKLLgjOKg9whA64Kpbgm0w-1iLmqcgRmQSwCqUwkXEmGGwWRQiyAftoow4eup_BaGBMcQynNUn0CDLEFoh0lDL4Ll6UY6LhyZYzhLzaIMr1TKNNkDaaLOhqD5Tyd_XH_GrYbvVZz0HxoP53DDn082sfx_Qso5Z9zfYlwIE-u7E_wDTrDslE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+Pacific+Visualization+Symposium&rft.atitle=POEM%3A+Interactive+Prompt+Optimization+for+Enhancing+Multimodal+Reasoning+of+Large+Language+Models&rft.au=He%2C+Jianben&rft.au=Wang%2C+Xingbo&rft.au=Liu%2C+Shiyi&rft.au=Wu%2C+Guande&rft.date=2025-04-22&rft.pub=IEEE&rft.eissn=2165-8773&rft.spage=36&rft.epage=46&rft_id=info:doi/10.1109%2FPacificVis64226.2025.00010&rft.externalDocID=11021043