MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language
Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address thi...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
22.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Manga, a widely celebrated Japanese comic art form, is renowned for its
diverse narratives and distinct artistic styles. However, the inherently visual
and intricate structure of Manga, which comprises images housing multiple
panels, poses significant challenges for content retrieval. To address this, we
present MaRU (Manga Retrieval and Understanding), a multi-staged system that
connects vision and language to facilitate efficient search of both dialogues
and scenes within Manga frames. The architecture of MaRU integrates an object
detection model for identifying text and frame bounding boxes, a Vision
Encoder-Decoder model for text recognition, a text encoder for embedding text,
and a vision-text encoder that merges textual and visual information into a
unified embedding space for scene retrieval. Rigorous evaluations reveal that
MaRU excels in end-to-end dialogue retrieval and exhibits promising results for
scene retrieval. |
---|---|
DOI: | 10.48550/arxiv.2311.02083 |