An attention-based hybrid deep learning approach for bengali video captioning
Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored. Therefore, this research aims at generating Bengali...
Saved in:
Published in | Journal of King Saud University. Computer and information sciences Vol. 35; no. 1; pp. 257 - 269 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2023
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored. Therefore, this research aims at generating Bengali captions that plausibly describe the gist of a specific video as well as identifying the best performing model for Bengali video captioning. To accomplish this, several sequence-to-sequence models – LSTM, BiLSTM, and GRU are implemented that takes the video frame features as input, extracted through different CNN models – VGG-19, Inceptionv3, and ResNet50v2, and provides a corresponding textual description as output. Moreover, the Attention mechanism is incorporated with these models as a first-ever attempt in Bengali video captioning. In this study, a novel Bengali video captioning dataset is constructed from Microsoft Research Video Description Corpus (MSVD) dataset (an English video captioning dataset) through utilizing a deep learning-based translator and manual post-editing efforts. Finally, the model’s performance is evaluated in terms of popular performance evaluation metrics - BLEU, METEOR, and ROUGE. The proposed attention-based hybrid model outperforms the existing models in terms of these evaluation metrics, establishing a new benchmark for Bengali video captioning. |
---|---|
ISSN: | 1319-1578 2213-1248 |
DOI: | 10.1016/j.jksuci.2022.11.015 |