Multimodal video behavior recognition method based on language-vision contrast learning

The invention discloses a multi-modal video behavior recognition method based on language visual comparative learning, which comprises the following steps: acquiring video data and language description of a tag corresponding to the video data, dividing a language video data set into a training set a...

Full description

Saved in:

Bibliographic Details
Main Authors	ZHANG YING, ZHANG BINGBING, AN FENGMIN, ZHANG JIANXIN, DONG WEI, ZHANG QIANG
Format	Patent
Language	Chinese English
Published	08.12.2023
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention discloses a multi-modal video behavior recognition method based on language visual comparative learning, which comprises the following steps: acquiring video data and language description of a tag corresponding to the video data, dividing a language video data set into a training set and a test set, and performing frame extraction on the video data; using a contrast language image pre-training model as a basic network, expanding the basic network so as to construct a video multi-modal network based on language visual contrast learning, and classifying videos by the video multi-modal network according to similarity information of video features and language features; performing iterative training on the video multi-mode network by using language and video data in a training set so as to update network parameters, wherein the training process comprises forward propagation of network features and back propagation of errors; network parameters are updated in each iteration, training and verification
Bibliography:	Application Number: CN202310526292