An attention-based hybrid deep learning approach for bengali video captioning

Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored. Therefore, this research aims at generating Bengali...

Full description

Saved in:
Bibliographic Details
Published inJournal of King Saud University. Computer and information sciences Vol. 35; no. 1; pp. 257 - 269
Main Authors Zaoad, Md. Shahir, Mannan, M.M. Rushadul, Mandol, Angshu Bikash, Rahman, Mostafizur, Islam, Md. Adnanul, Rahman, Md. Mahbubur
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2023
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Video captioning is an automated process of captioning a video by understanding the content within it. Although numerous studies have been performed on video captioning in English, the field of video captioning in Bengali remains nearly unexplored. Therefore, this research aims at generating Bengali captions that plausibly describe the gist of a specific video as well as identifying the best performing model for Bengali video captioning. To accomplish this, several sequence-to-sequence models – LSTM, BiLSTM, and GRU are implemented that takes the video frame features as input, extracted through different CNN models – VGG-19, Inceptionv3, and ResNet50v2, and provides a corresponding textual description as output. Moreover, the Attention mechanism is incorporated with these models as a first-ever attempt in Bengali video captioning. In this study, a novel Bengali video captioning dataset is constructed from Microsoft Research Video Description Corpus (MSVD) dataset (an English video captioning dataset) through utilizing a deep learning-based translator and manual post-editing efforts. Finally, the model’s performance is evaluated in terms of popular performance evaluation metrics - BLEU, METEOR, and ROUGE. The proposed attention-based hybrid model outperforms the existing models in terms of these evaluation metrics, establishing a new benchmark for Bengali video captioning.
ISSN:1319-1578
2213-1248
DOI:10.1016/j.jksuci.2022.11.015