EXTRACTION OF SPATIAL-TEMPORAL FEATURES FROM A VIDEO

Implementations of the subject matter described herein provide a solution for extracting spatial-temporal feature representation. In this solution, an input comprising a plurality of images is received at a first layer of a learning network. First features that characterize spatial presentation of t...

Full description

Saved in:
Bibliographic Details
Main Authors YAO, Ting, MEI, Tao
Format Patent
LanguageEnglish
French
German
Published 17.06.2020
Subjects
Online AccessGet full text

Cover

More Information
Summary:Implementations of the subject matter described herein provide a solution for extracting spatial-temporal feature representation. In this solution, an input comprising a plurality of images is received at a first layer of a learning network. First features that characterize spatial presentation of the images are extracted from the input in a spatial dimension using a first unit of the first layer. Based on a type of a connection between the first unit and a second unit of the first layer, second features at least characterizing temporal changes across the images are extracted from the first features and/or the input in a temporal dimension using the second unit. A spatial-temporal feature representation of the images is generated partially based on the second features. Through this solution, it is possible to reduce learning network sizes, improve training and use efficiency of learning networks, and obtain accurate spatial-temporal feature representations.
Bibliography:Application Number: EP20180746334