Spatially Aware Fusion in 3D Convolutional Autoencoders for Video Anomaly Detection

Surveillance videos are crucial for crime prevention and public safety, yet the challenge of defining abnormal events hinders their effectiveness, limiting the applicability of supervised methods. This paper introduces an unsupervised end-to-end architecture for video anomaly detection that applies...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 104770 - 104784
Main Authors	Niaz, Asim, Ul Amin, Sareer, Soomro, Shafiullah, Zia, Hamza, Nam Choi, Kwang
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Anomalies Anomaly detection autoencoders Computer Science computer vision Crime prevention Datavetenskap Effectiveness Encoders-Decoders Feature extraction intelligent surveillance systems Optical flow Pedestrians Predictive models Public safety Representations Surveillance three-dimensional convolutional neural network (3DCNN) video-based abnormal event detection Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Surveillance videos are crucial for crime prevention and public safety, yet the challenge of defining abnormal events hinders their effectiveness, limiting the applicability of supervised methods. This paper introduces an unsupervised end-to-end architecture for video anomaly detection that applies spatial and temporal features to identify anomalies in surveillance footage. The model employs a three-dimensional (3D) convolutional autoencoder, with an encoder-decoder structure that learns spatiotemporal representations and reconstructs the input through the latent space. Skip connections linking the encoder and decoder blocks facilitate the transfer of information across various scales of feature representations, enhancing the reconstruction process and improving the overall performance. The architecture incorporates spatial attention modules that highlight informative regions in the input, enabling improved anomaly detection. Spatial and contextual dependencies are further acquired using 3D convolutional filters. The performance of the proposed model is assessed on four benchmark datasets: UCSD Pedestrian 1, UCSD Pedestrian 2, CUHK Avenue, and ShanghaiTech. Notably, the proposed model achieves frame-based Area Under the Curve (AUC) scores of 94.6% on UCSD Ped 1, 96.7% on UCSD Ped 2, 84.7% on CUHK Avenue, and 74.8% on ShanghaiTech. These results demonstrate the state-of-the-art performance of the proposed approach, highlighting its efficacy in real-world anomaly detection scenarios.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3435144