CCST: crowd counting with swin transformer

Accurately estimating the number of individuals contained in an image is the purpose of the crowd counting. It has always faced two major difficulties: uneven distribution of crowd density and large span of head size. Focusing on the former, most CNN-based methods divide the image into multiple patc...

Full description

Saved in:

Bibliographic Details
Published in	The Visual computer Vol. 39; no. 7; pp. 2671 - 2682
Main Authors	Li, Bo, Zhang, Yong, Xu, Haihui, Yin, Baocai
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2023 Springer Nature B.V
Subjects	Artificial Intelligence Computer Graphics Computer Science Density Design Image Processing and Computer Vision Matching Original Article Semantics Supervision Transformers Feature adaptive fusion Transformer Crowd counting Uneven distribution of crowd density Large span of head size
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Accurately estimating the number of individuals contained in an image is the purpose of the crowd counting. It has always faced two major difficulties: uneven distribution of crowd density and large span of head size. Focusing on the former, most CNN-based methods divide the image into multiple patches for processing, ignoring the connection between the patches. For the latter, the multi-scale feature fusion method using feature pyramid ignores the matching relationship between the head size and the hierarchical features. In response to the above issues, we propose a crowd counting network named CCST based on swin transformer, and tailor a feature adaptive fusion regression head called FAFHead. Swin transformer can fully exchange information within and between patches, and effectively alleviate the problem of uneven distribution of crowd density. FAFHead can adaptively fuse multi-level features, improve the matching relationship between head size and feature pyramid hierarchy, and relief the problem of large span of head size available. Experimental results on common datasets show that CCST has better counting performance than all weakly supervised counting works and great majority of popular density map-based fully supervised works.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-022-02485-3