Content understanding method and device for video static scene, equipment and medium
The invention discloses a video static scene content understanding method and device, equipment and a medium. The method comprises the following steps: acquiring video data; performing classification processing on the video data based on a CLIP fine tuning model, and outputting and obtaining a scene...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
03.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a video static scene content understanding method and device, equipment and a medium. The method comprises the following steps: acquiring video data; performing classification processing on the video data based on a CLIP fine tuning model, and outputting and obtaining a scene content maximum probability result; the method comprises the following steps: performing feature extraction on video data based on a visual encoder in a Video-LLaMA pre-training model to obtain visual features, performing matching alignment on the visual features and existing pre-training text features in the Video-LLaMA pre-training model, obtaining text features corresponding to the current visual features from the existing pre-training text features, and outputting the text features corresponding to the current visual features. Carrying out character retouching processing on the text features corresponding to the current visual features based on a large language model, and outputting to obtain a scene content t |
---|---|
Bibliography: | Application Number: CN202410169799 |