Training method of cross-modal video text retrieval model based on event awareness

The invention relates to the technical field of machine learning, and provides a cross-modal video text retrieval model training method based on event awareness, and the method comprises the steps: obtaining a sample video and an initial retrieval model, the sample video comprising the frame descrip...

Full description

Saved in:

Bibliographic Details
Main Authors	LI BING, HU WEIMING, YUAN CHUNFENG, MA ZONGYANG, ZHANG ZIQI
Format	Patent
Language	Chinese English
Published	30.07.2024
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention relates to the technical field of machine learning, and provides a cross-modal video text retrieval model training method based on event awareness, and the method comprises the steps: obtaining a sample video and an initial retrieval model, the sample video comprising the frame description of each video frame; extracting frame features of each video frame and video features of the sample video; performing event content alignment based on the frame features of the video frames and the frame text features described by the frames to determine event content perception loss; performing event time sequence alignment based on the video features and the overall text features of the sample video, and determining event time sequence perception loss; and obtaining a cross-modal video text retrieval model based on the event content perception loss and the event time sequence perception loss. According to the method provided by the invention, through event content alignment of video frame granularity and eve
Bibliography:	Application Number: CN202410845065