Relational Attention Network for Crowd Counting

Crowd counting is receiving rapidly growing research interests due to its potential application value in numerous real-world scenarios. However, due to various challenges such as occlusion, insufficient resolution and dynamic backgrounds, crowd counting remains an unsolved problem in computer vision...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Computer Vision pp. 6787 - 6796
Main Authors Zhang, Anran, Shen, Jiayi, Xiao, Zehao, Zhu, Fan, Zhen, Xiantong, Cao, Xianbin, Shao, Ling
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Crowd counting is receiving rapidly growing research interests due to its potential application value in numerous real-world scenarios. However, due to various challenges such as occlusion, insufficient resolution and dynamic backgrounds, crowd counting remains an unsolved problem in computer vision. Density estimation is a popular strategy for crowd counting, where conventional density estimation methods perform pixel-wise regression without explicitly accounting the interdependence of pixels. As a result, independent pixel-wise predictions can be noisy and inconsistent. In order to address such an issue, we propose a Relational Attention Network (RANet) with a self-attention mechanism for capturing interdependence of pixels. The RANet enhances the self-attention mechanism by accounting both short-range and long-range interdependence of pixels, where we respectively denote these implementations as local self-attention (LSA) and global self-attention (GSA). We further introduce a relation module to fuse LSA and GSA to achieve more informative aggregated feature representations. We conduct extensive experiments on four public datasets, including ShanghaiTech A, ShanghaiTech B, UCF-CC-50 and UCF-QNRF. Experimental results on all datasets suggest RANet consistently reduces estimation errors and surpasses the state-of-the-art approaches by large margins.
ISSN:2380-7504
DOI:10.1109/ICCV.2019.00689