Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection

Video anomaly detection in surveillance systems with only video-level labels (i.e. weakly supervised) is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-wor...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 241; p. 103955
Main Authors	Majhi, Snehashis, Dai, Rui, Kong, Quan, Garattoni, Lorenzo, Francesca, Gianpiero, Brémond, François
Format	Journal Article
Language	English
Published	Elsevier Inc 01.04.2024
Subjects	Video anomaly detection Weakly-supervised learning 65D05 Video anomaly detection 65D17 Weakly-supervised learning 41A05 41A10
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Video anomaly detection in surveillance systems with only video-level labels (i.e. weakly supervised) is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-world videos and (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we propose a Human-Scene Network to learn discriminative representations by capturing both subtle and strong cues in a dissociative manner. In addition, a self-rectifying loss is proposed that dynamically computes the pseudo-temporal annotations from video-level labels for optimizing the Human-Scene Network effectively. The proposed Human-Scene Network optimized with self-rectifying loss is validated on three publicly available datasets i.e. UCF-Crime, ShanghaiTech, and IITB-Corridor, outperforming recently reported state-of-the-art approaches on five out of the six scenarios considered. •A Human-Scene Network to detect human and scene centric divergent video anomalies.•An effective and salient feature combination strategy in decoupled sub-networks.•A self-rectifying loss for better separability among instances in weak-supervision.•The results outperform benchmark methods on many scenarios considered.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2024.103955