Collaborative three-stream transformers for video captioning
As the most critical components in a sentence, subject, predicate and object require special attention in the video captioning task. To implement this idea, we design a novel framework, named COllaborative three-Stream Transformers (COST), to model the three parts separately and complement each othe...
Saved in:
Published in | Computer vision and image understanding Vol. 235; p. 103799 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!