Collaborative three-stream transformers for video captioning

As the most critical components in a sentence, subject, predicate and object require special attention in the video captioning task. To implement this idea, we design a novel framework, named COllaborative three-Stream Transformers (COST), to model the three parts separately and complement each othe...

Full description

Saved in:
Bibliographic Details
Published inComputer vision and image understanding Vol. 235; p. 103799
Main Authors Wang, Hao, Zhang, Libo, Fan, Heng, Luo, Tiejian
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2023
Subjects
Online AccessGet full text

Cover

Loading…