Classifying a video stream using a self-attention-based machine-learning model

In one embodiment, a method includes accessing a stream of F video frames, where each of the F video frames includes N patches that are non-overlapping, generating an initial embedding vector for each of the N×F patches in the F video frames, generating a classification embedding by processing the g...

Full description

Saved in:
Bibliographic Details
Main Authors Bertasius, Gediminas, Torresani, Lorenzo, Wang, Heng
Format Patent
LanguageEnglish
Published 03.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In one embodiment, a method includes accessing a stream of F video frames, where each of the F video frames includes N patches that are non-overlapping, generating an initial embedding vector for each of the N×F patches in the F video frames, generating a classification embedding by processing the generated N×F initial embedding vectors using a self-attention-based machine-learning model that computes a temporal attention and a spatial attention for each of the N×F patches, and determining a class of the stream of video frames based on the generated classification embedding.
Bibliography:Application Number: US202117461755