Bidirectional Spatio-Temporal Feature Learning With Multiscale Evaluation for Video Anomaly Detection
Video anomaly detection aims to detect the segments containing abnormal events from video sequence, which is a current research hotspot due to the importance in maintaining social security. Recent detection methods tend to build frame reconstruction or frame prediction model based on deep learning t...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 32; no. 12; pp. 8285 - 8296 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video anomaly detection aims to detect the segments containing abnormal events from video sequence, which is a current research hotspot due to the importance in maintaining social security. Recent detection methods tend to build frame reconstruction or frame prediction model based on deep learning to learn features of events. The reconstruction-based methods reproduce the input frame one-to-one, inevitably losing some temporal features. The prediction-based methods predict frames according to the natural time order, but ignore the reverse time information, causing the deviation in information learning. Besides, anomaly evaluation methods based on patch-level error neglect the diversity of object sizes in complex scenes, and it is difficult to determine the optimal size of the error patch accurately. For these issues, we propose a bidirectional spatio-temporal feature learning framework with multi-scale anomaly evaluation strategy. A video sequence is input to a double-encoder double-decoder network, and bidirectional spatio-temporal features are obtained for bidirectional prediction by fusing forward and backward features extracted from the two encoders. The multi-scale anomaly evaluation method is implemented based on error pyramid and mean pooling, which effectively detects target objects with different sizes. Experiments on several publicly video datasets show that our method outperforms most of existing methods. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2022.3190539 |