Multimodal video behavior recognition method based on language-vision contrast learning
The invention discloses a multi-modal video behavior recognition method based on language visual comparative learning, which comprises the following steps: acquiring video data and language description of a tag corresponding to the video data, dividing a language video data set into a training set a...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
08.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a multi-modal video behavior recognition method based on language visual comparative learning, which comprises the following steps: acquiring video data and language description of a tag corresponding to the video data, dividing a language video data set into a training set and a test set, and performing frame extraction on the video data; using a contrast language image pre-training model as a basic network, expanding the basic network so as to construct a video multi-modal network based on language visual contrast learning, and classifying videos by the video multi-modal network according to similarity information of video features and language features; performing iterative training on the video multi-mode network by using language and video data in a training set so as to update network parameters, wherein the training process comprises forward propagation of network features and back propagation of errors; network parameters are updated in each iteration, training and verification |
---|---|
Bibliography: | Application Number: CN202310526292 |