XMeCap: Meme Caption Generation with Sub-Image Adaptability
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. While advances have been made in natural language processing, real-world humor often thrives in a multi-modal context, encapsulated distinctively by memes. This paper poses a particular emphasis on...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Humor, deeply rooted in societal meanings and cultural details, poses a
unique challenge for machines. While advances have been made in natural
language processing, real-world humor often thrives in a multi-modal context,
encapsulated distinctively by memes. This paper poses a particular emphasis on
the impact of multi-images on meme captioning. After that, we introduce the
\textsc{XMeCap} framework, a novel approach that adopts supervised fine-tuning
and reinforcement learning based on an innovative reward model, which factors
in both global and local similarities between visuals and text. Our results,
benchmarked against contemporary models, manifest a marked improvement in
caption generation for both single-image and multi-image memes, as well as
different meme categories. \textsc{XMeCap} achieves an average evaluation score
of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming
the best baseline by 3.71\% and 4.82\%, respectively. This research not only
establishes a new frontier in meme-related studies but also underscores the
potential of machines in understanding and generating humor in a multi-modal
setting. |
---|---|
AbstractList | Humor, deeply rooted in societal meanings and cultural details, poses a
unique challenge for machines. While advances have been made in natural
language processing, real-world humor often thrives in a multi-modal context,
encapsulated distinctively by memes. This paper poses a particular emphasis on
the impact of multi-images on meme captioning. After that, we introduce the
\textsc{XMeCap} framework, a novel approach that adopts supervised fine-tuning
and reinforcement learning based on an innovative reward model, which factors
in both global and local similarities between visuals and text. Our results,
benchmarked against contemporary models, manifest a marked improvement in
caption generation for both single-image and multi-image memes, as well as
different meme categories. \textsc{XMeCap} achieves an average evaluation score
of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming
the best baseline by 3.71\% and 4.82\%, respectively. This research not only
establishes a new frontier in meme-related studies but also underscores the
potential of machines in understanding and generating humor in a multi-modal
setting. |
Author | Chen, Yuyan Zhu, Zhihong Li, Zhixu Yan, Songzhou Xiao, Yanghua |
Author_xml | – sequence: 1 givenname: Yuyan surname: Chen fullname: Chen, Yuyan – sequence: 2 givenname: Songzhou surname: Yan fullname: Yan, Songzhou – sequence: 3 givenname: Zhihong surname: Zhu fullname: Zhu, Zhihong – sequence: 4 givenname: Zhixu surname: Li fullname: Li, Zhixu – sequence: 5 givenname: Yanghua surname: Xiao fullname: Xiao, Yanghua |
BackLink | https://doi.org/10.48550/arXiv.2407.17152$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxMNczNDc0NeJksI7wTXVOLLBS8E3NTVUAskoy8_MU3FPzUosSwczyzJIMheDSJF3P3MT0VAXHFKCSxKTMnMySSh4G1rTEnOJUXijNzSDv5hri7KELtia-oCgzN7GoMh5kXTzYOmPCKgCLFTXI |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2407.17152 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2407_17152 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2407_171523 |
IEDL.DBID | GOX |
IngestDate | Tue Sep 24 12:24:43 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2407_171523 |
OpenAccessLink | https://arxiv.org/abs/2407.17152 |
ParticipantIDs | arxiv_primary_2407_17152 |
PublicationCentury | 2000 |
PublicationDate | 2024-07-24 |
PublicationDateYYYYMMDD | 2024-07-24 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-07-24 day: 24 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8527136 |
SecondaryResourceType | preprint |
Snippet | Humor, deeply rooted in societal meanings and cultural details, poses a
unique challenge for machines. While advances have been made in natural
language... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition |
Title | XMeCap: Meme Caption Generation with Sub-Image Adaptability |
URI | https://arxiv.org/abs/2407.17152 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdVxBT0IxDG6AExciUQIquoPXQej2WJ6eCBHR5OlFknd7edvrSzhgCKLRf2-3YfTCbdmarenSfG3XfQA3k6lBStBIH95KjUjSEqbSps5WVWqNCs3j2fN0udJPeZI3QPz-hSl3X-vPyA9s38c-3RhNDGNME5qIvmXr4SWPj5OBiusg_yfHMWaY-gcSixPoHKI7MYvX0YUGvZ3CXZ7RvNzeiow2JHjkbSEi33MY-lqoYA-Wjxv2bjGrWCTSZ3-fwfXi_nW-lOG4Yhu5IQqvSRE0UT1ocQZPfRCkXE21zw0cg4YzJepSJ9YpmxCpOh1A_9gu58eXLqCNjLC-0Ij6Elr73QcNGSH39iqY6QcFXmnT |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=XMeCap%3A+Meme+Caption+Generation+with+Sub-Image+Adaptability&rft.au=Chen%2C+Yuyan&rft.au=Yan%2C+Songzhou&rft.au=Zhu%2C+Zhihong&rft.au=Li%2C+Zhixu&rft.date=2024-07-24&rft_id=info:doi/10.48550%2Farxiv.2407.17152&rft.externalDocID=2407_17152 |