EduCross: Dual adversarial bipartite hypergraph learning for cross-modal retrieval in multimodal educational slides

In the digital education landscape, cross-modal retrieval (CMR) from multimodal educational slides represents a significant challenge, particularly because of the complex nature of academic content, which includes images, diagrams, equations, and tables across various subjects such as mathematics an...

Full description

Saved in:
Bibliographic Details
Published inInformation fusion Vol. 109; p. 102428
Main Authors Li, Ming, Zhou, Siwei, Chen, Yuting, Huang, Changqin, Jiang, Yunliang
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2024
Subjects
Online AccessGet full text
ISSN1566-2535
1872-6305
DOI10.1016/j.inffus.2024.102428

Cover

Loading…
More Information
Summary:In the digital education landscape, cross-modal retrieval (CMR) from multimodal educational slides represents a significant challenge, particularly because of the complex nature of academic content, which includes images, diagrams, equations, and tables across various subjects such as mathematics and biology. Current CMR systems are primarily designed for “(natural) image to text” interactions (or vice versa) and inadequately address real-world educational scenarios. This study presents EduCross, a novel framework devised to enhance CMR within multimodal educational slides, which is a domain in which traditional retrieval systems fall short. Recognizing the imperative for a system that is tailored to the educational context, EduCross integrates dual adversarial bipartite hypergraph learning, harnessing the capabilities of generative adversarial networks with figure-text dual channels. This powerful combination facilitates robust bidirectional mapping, allowing for the precise association of figures with their descriptive spoken language segments and ensuring a comprehensive CMR experience. Specifically, we develop framelet-based deep bipartite hypergraph neural networks that effectively manage the high-order relationships between diverse educational content types and various types of slide figures. Our experimental results underscore the superior performance of EduCross, demonstrating its effectiveness through the use of the real Multimodal Lecture Presentations dataset that mirrors authentic educational settings. These outcomes highlight the significant advancements of EduCross over existing methods, marking a leap forward in the accurate retrieval of multimodal educational content. •A novel dual adversarial bipartite hypergraph learning method for cross-modal retrieval.•Novel framelet-based deep bipartite hypergraph neural networks are developed.•EduCross combines adversarial learning with bipartite hypergraph learning.•EduCross achieves SOTA results on the real-world multimodal educational slide dataset.
ISSN:1566-2535
1872-6305
DOI:10.1016/j.inffus.2024.102428