Transformer Module Networks for Systematic Generalization in Visual Question Answering

Transformers achieve great performance on Visual Question Answering (VQA). However, their systematic generalization capabilities, i.e., handling novel combinations of known concepts, is unclear. We reveal that Neural Module Networks (NMNs), i.e., question-specific compositions of modules that tackle...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 12; pp. 10096 - 10105
Main Authors	Yamada, Moyuru, D'Amario, Vanessa, Takemoto, Kentaro, Boix, Xavier, Sasaki, Tomotake
Format	Journal Article
Language	English
Published	United States IEEE 01.12.2024
Subjects	Cognition Libraries Neural module network Question answering (information retrieval) systematic generalization Systematics Training transformer Transformers visual question answering Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Transformers achieve great performance on Visual Question Answering (VQA). However, their systematic generalization capabilities, i.e., handling novel combinations of known concepts, is unclear. We reveal that Neural Module Networks (NMNs), i.e., question-specific compositions of modules that tackle a sub-task, achieve better or similar systematic generalization performance than the conventional Transformers, even though NMNs' modules are CNN-based. In order to address this shortcoming of Transformers with respect to NMNs, in this paper we investigate whether and how modularity can bring benefits to Transformers. Namely, we introduce Transformer Module Network (TMN), a novel NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, improving more than 30% over standard Transformers for novel compositions of sub-tasks. We show that not only the module composition but also the module specialization for each sub-task are the key of such performance gain.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2024.3438887