Classifying a video stream using a self-attention-based machine-learning model
In one embodiment, a method includes accessing a stream of F video frames, where each of the F video frames includes N patches that are non-overlapping, generating an initial embedding vector for each of the N×F patches in the F video frames, generating a classification embedding by processing the g...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
03.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In one embodiment, a method includes accessing a stream of F video frames, where each of the F video frames includes N patches that are non-overlapping, generating an initial embedding vector for each of the N×F patches in the F video frames, generating a classification embedding by processing the generated N×F initial embedding vectors using a self-attention-based machine-learning model that computes a temporal attention and a spatial attention for each of the N×F patches, and determining a class of the stream of video frames based on the generated classification embedding. |
---|---|
Bibliography: | Application Number: US202117461755 |