Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data

Popular vision-based crowd counting methods suffer from huge costs, limited coverage and high complexity, making it difficult to be applied for large surveillance areas, while emerging WiFi-based methods which are suitable for large surveillance areas incur limited accuracy due to the sparsity and r...

Full description

Saved in:
Bibliographic Details
Published in2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors Guo, Rui, Huang, Baoqi, Hao, Lifei, Jia, Bing
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Popular vision-based crowd counting methods suffer from huge costs, limited coverage and high complexity, making it difficult to be applied for large surveillance areas, while emerging WiFi-based methods which are suitable for large surveillance areas incur limited accuracy due to the sparsity and randomness of WiFi sniffing data. Considering the fact that the variations of audio data are spatial-temporally correlated with crowd fluctuations, this paper proposes to fuse audio and WiFi sniffing data for crowd counting by developing a Cross-modal Multi-level Perception Network, termed CMPN. The CMPN can not only extract crowd features from the bimodal data to leverage the temporally continuity for compensating sparse WiFi sniffing data, but also mine the correlation of intra- and inter-modality crowd features for accurate crowd counting. Extensive experiments are conducted in a real campus with the surveillance area of about 4000m 2 , and demonstrate that the CMPN can achieve the mean absolute error of 5.88, resulting in a 22.12% reduction compared to the state-of-the-art WiFi-only method.
ISSN:2161-4407
DOI:10.1109/IJCNN60899.2024.10651535