Bidirectional Spatio-Temporal Feature Learning With Multiscale Evaluation for Video Anomaly Detection

Video anomaly detection aims to detect the segments containing abnormal events from video sequence, which is a current research hotspot due to the importance in maintaining social security. Recent detection methods tend to build frame reconstruction or frame prediction model based on deep learning t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 12; pp. 8285 - 8296
Main Authors	Zhong, Yuanhong, Chen, Xia, Hu, Yongting, Tang, Panliang, Ren, Fan
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Anomalies Anomaly detection bidirectional prediction Bidirectional spatio-temporal feature Coders Deep learning error pyramid Feature extraction Object recognition Prediction models Predictive models Reconstruction Representation learning Social security Spatial temporal resolution Target detection video anomaly detection Video sequences
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Video anomaly detection aims to detect the segments containing abnormal events from video sequence, which is a current research hotspot due to the importance in maintaining social security. Recent detection methods tend to build frame reconstruction or frame prediction model based on deep learning to learn features of events. The reconstruction-based methods reproduce the input frame one-to-one, inevitably losing some temporal features. The prediction-based methods predict frames according to the natural time order, but ignore the reverse time information, causing the deviation in information learning. Besides, anomaly evaluation methods based on patch-level error neglect the diversity of object sizes in complex scenes, and it is difficult to determine the optimal size of the error patch accurately. For these issues, we propose a bidirectional spatio-temporal feature learning framework with multi-scale anomaly evaluation strategy. A video sequence is input to a double-encoder double-decoder network, and bidirectional spatio-temporal features are obtained for bidirectional prediction by fusing forward and backward features extracted from the two encoders. The multi-scale anomaly evaluation method is implemented based on error pyramid and mean pooling, which effectively detects target objects with different sizes. Experiments on several publicly video datasets show that our method outperforms most of existing methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2022.3190539