Content understanding method and device for video static scene, equipment and medium

The invention discloses a video static scene content understanding method and device, equipment and a medium. The method comprises the following steps: acquiring video data; performing classification processing on the video data based on a CLIP fine tuning model, and outputting and obtaining a scene...

Full description

Saved in:
Bibliographic Details
Main Authors LU XINKAI, GU JIAXIN, SHEN XIONG
Format Patent
LanguageChinese
English
Published 03.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a video static scene content understanding method and device, equipment and a medium. The method comprises the following steps: acquiring video data; performing classification processing on the video data based on a CLIP fine tuning model, and outputting and obtaining a scene content maximum probability result; the method comprises the following steps: performing feature extraction on video data based on a visual encoder in a Video-LLaMA pre-training model to obtain visual features, performing matching alignment on the visual features and existing pre-training text features in the Video-LLaMA pre-training model, obtaining text features corresponding to the current visual features from the existing pre-training text features, and outputting the text features corresponding to the current visual features. Carrying out character retouching processing on the text features corresponding to the current visual features based on a large language model, and outputting to obtain a scene content t
Bibliography:Application Number: CN202410169799