PROCESSING VIDEOS BASED ON TEMPORAL STAGES

Disclosed is a technical solution to process a video that captures actions to be performed for completing a task based on a chronological sequence of stages within the task. An example system may identify an action sequence from an instruction for the task. The system inputs the action sequence into...

Full description

Saved in:
Bibliographic Details
Main Authors Beckwith, Richard, Biswas, Sovan, Manuvinakurike, Ramesh Radhakrishna, Rhodes, Anthony Daniel, Raffa, Giuseppe
Format Patent
LanguageEnglish
Published 20.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Disclosed is a technical solution to process a video that captures actions to be performed for completing a task based on a chronological sequence of stages within the task. An example system may identify an action sequence from an instruction for the task. The system inputs the action sequence into a trained model (e.g., a recurrent neural network), which outputs the chronological sequence of stages. The RNN may be trained through self-supervised learning. The system may input the video and the chronological sequence of stages into another trained model, e.g., a temporal convolutional network. The other trained model may include hidden layers arranged before an attention layer. The hidden layers may extract features from the video and feed the features into the attention layer. The attention layer may determine attention weights of the features based on the chronological sequence of stages.
Bibliography:Application Number: US202218050757