Content based video retrieval system using two stream convolutional neural network

Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 82; no. 16; pp. 24465 - 24483
Main Authors Sowmyayani, S., Rani, P. Arockia Jansi
Format Journal Article
LanguageEnglish
Published New York Springer US 01.07.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given video query. The objective of the paper is retrieve most relevant videos for a given query video in reduced time. To meet the objective, this paper proposes an efficient video retrieval system using salient object detection and keyframe extraction methods to reduce the high dimensionality of video data. The spatio-temporal features are extracted using two-stream Convolutional Neural Network (CNN) and stored in a feature dataset. The salient objects are used to search the exact subject that is given as query. The relevant videos are identified through similarity matching of feature dataset that are created using the input dataset with the feature of query video. To reduce the complexity of similarity matching, the proposed method replaces feature dataset with classification score dataset. Experiments are conducted on TRECVID and CC_Web_Video datasets and evaluated using precision, recall, specificity, accuracy and f-score. The proposed method is compared with recent methods and proved its efficiency with approximately 99.68% precision rate on TRECVID dataset and 88.9% precision rate on CC_Web_Video dataset. The proposed outperforms most recent methods by 0.001 increase in mean Average Precision (mAP) on CC_Web_Video dataset and 4% increase in precision rate on TRECVID dataset. The computation time is reduced by 100 min on TRECVID and 200 min on CC_Web_Video datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-023-14784-5