Counting people inside a region-of-interest in CCTV footage with deep learning

In recent years, the performance of people-counting models has been dramatically increased that they can be implemented in practical cases. However, the current models can only count all of the people captured in the inputted closed circuit television (CCTV) footage. Oftentimes, we only want to coun...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 8; p. e1067
Main Authors	Pardamean, Bens, Abid, Faizal, Cenggoro, Tjeng Wawan, Elwirehardja, Gregorius Natanael, Muljo, Hery Harjono
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 22.09.2022 PeerJ, Inc PeerJ Inc
Subjects	Building management systems Closed circuit television Computer networks Computer Vision Convolutional neural networks Counting Crowds Data Mining and Machine Learning Datasets Deep learning Machine learning Methods Object recognition (Computers) Pattern recognition People counting Performance degradation Performance enhancement Region-of-Interest Root-mean-square errors Indonesia Deep learning Region-of-Interest People counting Convolutional neural networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the performance of people-counting models has been dramatically increased that they can be implemented in practical cases. However, the current models can only count all of the people captured in the inputted closed circuit television (CCTV) footage. Oftentimes, we only want to count people in a specific Region-of-Interest (RoI) in the footage. Unfortunately, simple approaches such as covering the area outside of the RoI are not applicable without degrading the performance of the models. Therefore, we developed a novel learning strategy that enables a deep-learning-based people counting model to count people only in a certain RoI. In the proposed method, the people counting model has two heads that are attached on top of a crowd counting backbone network. These two heads respectively learn to count people inside the RoI and negate the people count outside the RoI. We named this proposed method Gap Regularizer and tested it on ResNet-50, ResNet-101, CSRNet, and SFCN. The experiment results showed that Gap Regularizer can reduce the mean absolute error (MAE), root mean square error (RMSE), and grid average mean error (GAME) of ResNet-50, which is the smallest CNN model, with the highest reduction of 45.2%, 41.25%, and 46.43%, respectively. On shallow models such as the CSRNet, the regularizer can also drastically increase the SSIM by up to 248.65% in addition to reducing the MAE, RMSE, and GAME. The Gap Regularizer can also improve the performance of SFCN which is a deep CNN model with back-end features by up to 17.22% and 10.54% compared to its standard version. Moreover, the impacts of the Gap Regularizer on these two models are also generally statistically significant ( -value < 0.05) on the MOT17-09, MOT20-02, and RHC datasets. However, it has a limitation in which it is unable to make significant impacts on deep models without back-end features such as the ResNet-101.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.1067