EduCross: Dual adversarial bipartite hypergraph learning for cross-modal retrieval in multimodal educational slides
In the digital education landscape, cross-modal retrieval (CMR) from multimodal educational slides represents a significant challenge, particularly because of the complex nature of academic content, which includes images, diagrams, equations, and tables across various subjects such as mathematics an...
Saved in:
Published in | Information fusion Vol. 109; p. 102428 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.09.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 1566-2535 1872-6305 |
DOI | 10.1016/j.inffus.2024.102428 |
Cover
Loading…
Summary: | In the digital education landscape, cross-modal retrieval (CMR) from multimodal educational slides represents a significant challenge, particularly because of the complex nature of academic content, which includes images, diagrams, equations, and tables across various subjects such as mathematics and biology. Current CMR systems are primarily designed for “(natural) image to text” interactions (or vice versa) and inadequately address real-world educational scenarios. This study presents EduCross, a novel framework devised to enhance CMR within multimodal educational slides, which is a domain in which traditional retrieval systems fall short. Recognizing the imperative for a system that is tailored to the educational context, EduCross integrates dual adversarial bipartite hypergraph learning, harnessing the capabilities of generative adversarial networks with figure-text dual channels. This powerful combination facilitates robust bidirectional mapping, allowing for the precise association of figures with their descriptive spoken language segments and ensuring a comprehensive CMR experience. Specifically, we develop framelet-based deep bipartite hypergraph neural networks that effectively manage the high-order relationships between diverse educational content types and various types of slide figures. Our experimental results underscore the superior performance of EduCross, demonstrating its effectiveness through the use of the real Multimodal Lecture Presentations dataset that mirrors authentic educational settings. These outcomes highlight the significant advancements of EduCross over existing methods, marking a leap forward in the accurate retrieval of multimodal educational content.
•A novel dual adversarial bipartite hypergraph learning method for cross-modal retrieval.•Novel framelet-based deep bipartite hypergraph neural networks are developed.•EduCross combines adversarial learning with bipartite hypergraph learning.•EduCross achieves SOTA results on the real-world multimodal educational slide dataset. |
---|---|
ISSN: | 1566-2535 1872-6305 |
DOI: | 10.1016/j.inffus.2024.102428 |