CSTrans: Correlation-guided Self-Activation Transformer for Counting Everything

Counting everything, also named few-shot counting, requires a model to be able to count objects with any novel (unseen) category giving few exemplar boxes. However, the existing few-shot counting methods are sub-optimal due to weak feature representation, such as the correlation between the exemplar...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 153; p. 110556
Main Authors	Gao, Bin-Bin, Huang, Zhongyi
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.09.2024
Subjects	Counting everything Few-shot counting Local dependency Vision transformer Counting everything Vision transformer Local dependency Few-shot counting
Online Access	Get full text

Cover

Loading…

Abstract	Counting everything, also named few-shot counting, requires a model to be able to count objects with any novel (unseen) category giving few exemplar boxes. However, the existing few-shot counting methods are sub-optimal due to weak feature representation, such as the correlation between the exemplar patch and query feature, and contextual dependencies in density map prediction. In this paper, we propose a very simple but effective method, CSTrans, consisting of a Correlation-guided Self-Activation (CSA) module and a Local Dependency Transformer (LDT) module, to mitigate the above two issues, respectively. The CSA utilizes the correlation map to activate the semantic features and suppress the noisy influence of the query features, aiming at mining the potential relation while enriching correlation representation. Furthermore, the LDT incorporates a Transformer to explore local contextual dependencies and predict the density map. Our method achieves competitive performance on FSC-147 and CARPK datasets. We hope its simple implementation and superior performance can serve as a new and strong baseline for few-shot counting tasks and attract more interest in designing simple but effective models in future studies. Our code for CSTrans is available at https://github.com/gaobb/CSTrans. •A simple but effective CSTrans framework for counting everything.•A correlation-guided self-activation module for enriching feature representation.•A local dependency transformer module for modeling local context dependency.•Excellent performance on two few-shot counting datasets, FSC-147and CARPK.
AbstractList	Counting everything, also named few-shot counting, requires a model to be able to count objects with any novel (unseen) category giving few exemplar boxes. However, the existing few-shot counting methods are sub-optimal due to weak feature representation, such as the correlation between the exemplar patch and query feature, and contextual dependencies in density map prediction. In this paper, we propose a very simple but effective method, CSTrans, consisting of a Correlation-guided Self-Activation (CSA) module and a Local Dependency Transformer (LDT) module, to mitigate the above two issues, respectively. The CSA utilizes the correlation map to activate the semantic features and suppress the noisy influence of the query features, aiming at mining the potential relation while enriching correlation representation. Furthermore, the LDT incorporates a Transformer to explore local contextual dependencies and predict the density map. Our method achieves competitive performance on FSC-147 and CARPK datasets. We hope its simple implementation and superior performance can serve as a new and strong baseline for few-shot counting tasks and attract more interest in designing simple but effective models in future studies. Our code for CSTrans is available at https://github.com/gaobb/CSTrans. •A simple but effective CSTrans framework for counting everything.•A correlation-guided self-activation module for enriching feature representation.•A local dependency transformer module for modeling local context dependency.•Excellent performance on two few-shot counting datasets, FSC-147and CARPK.
ArticleNumber	110556
Author	Gao, Bin-Bin Huang, Zhongyi
Author_xml	– sequence: 1 givenname: Bin-Bin orcidid: 0000-0003-2572-8156 surname: Gao fullname: Gao, Bin-Bin email: csgaobb@gmail.com – sequence: 2 givenname: Zhongyi surname: Huang fullname: Huang, Zhongyi
BookMark	eNqFkEFuwjAQRa2KSgXaG3SRCyS149gmLCqhiNJKSCyga8vYY2oUYuQYJG7fQLrqol3NaOa_r5k_QoPGN4DQM8EZwYS_7LOjitrvshznRUYIZozfoSGZCJoyUuQDNMSYkpTmmD6gUdvuMSaiWwzRqlpvgmraaVL5EKBW0fkm3Z2cAZOsobbpTEd3vo2Tm9L6cICQdKVDTk10zS6ZnyFc4lfXPqJ7q-oWnn7qGH2-zTfVe7pcLT6q2TLVFPOYki1j2AgutrYoS0xFWcLElIQVjBLBqOLlhNgtaCYUA6t5wQTX2uRgOrkVdIymva8Ovm0DWKldvF0Zg3K1JFheo5F72Ucjr9HIPpoOLn7Bx-AOKlz-w157DLrHzg6CbLWDRoNxAXSUxru_Db4BuZGCRg
CitedBy_id	crossref_primary_10_1007_s11760_024_03792_z crossref_primary_10_1016_j_imavis_2024_105383
Cites_doi	10.1016/j.patcog.2021.108470 10.1109/ICCV.2019.00851 10.1109/CVPR42600.2020.00407 10.1109/CVPR42600.2020.00443 10.1007/978-3-030-20893-6_42 10.1109/CVPR46437.2021.00340 10.1109/TCSVT.2022.3179824 10.1109/ICCV48922.2021.00856 10.1016/j.patcog.2023.109830 10.1109/WACV48630.2021.00091 10.1109/TCSVT.2017.2656718 10.1109/WACV56688.2023.00625 10.1007/978-3-030-58452-8_13 10.1007/978-3-319-10602-1_48 10.1007/978-3-319-46487-9_48 10.1109/CVPR.2016.70 10.1109/ICCV.2017.324 10.1109/CVPR.2016.91 10.1109/ICCV48922.2021.00986 10.1016/j.neunet.2022.01.015 10.1038/s41592-018-0261-2 10.1109/TPAMI.2020.3013717 10.1109/CVPR52688.2022.00931 10.1109/CVPR.2016.90 10.1016/j.neunet.2021.10.010 10.1609/aaai.v35i3.16337 10.1016/j.patcog.2021.108484 10.1109/CVPR52729.2023.01492 10.1007/978-3-031-20044-1_20 10.24963/ijcai.2021/116 10.1109/ICCV.2017.446 10.1109/ICCV.2017.322 10.1109/CVPR.2017.660 10.1007/978-3-030-01234-2_17 10.1109/ICCV48922.2021.00061 10.1007/978-3-031-19818-2_42 10.1016/j.patcog.2023.109585
ContentType	Journal Article
Copyright	2024 Elsevier Ltd
Copyright_xml	– notice: 2024 Elsevier Ltd
DBID	AAYXX CITATION
DOI	10.1016/j.patcog.2024.110556
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1873-5142
ExternalDocumentID	10_1016_j_patcog_2024_110556 S0031320324003078
GrantInformation_xml	– fundername: Tencent
GroupedDBID	--K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABTAH ABXDB ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKYEP ANKPU APXCP BNPGV CITATION SSH
ID	FETCH-LOGICAL-c306t-1b550d767bf49903799e8d9154531753a6981fbec57a5efc64576ccd2ed990f73
IEDL.DBID	.~1
ISSN	0031-3203
IngestDate	Thu Apr 24 22:54:48 EDT 2025 Tue Jul 01 02:36:47 EDT 2025 Sat Jun 01 15:41:37 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Counting everything Vision transformer Local dependency Few-shot counting
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c306t-1b550d767bf49903799e8d9154531753a6981fbec57a5efc64576ccd2ed990f73
ORCID	0000-0003-2572-8156
ParticipantIDs	crossref_citationtrail_10_1016_j_patcog_2024_110556 crossref_primary_10_1016_j_patcog_2024_110556 elsevier_sciencedirect_doi_10_1016_j_patcog_2024_110556
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	September 2024 2024-09-00
PublicationDateYYYYMMDD	2024-09-01
PublicationDate_xml	– month: 09 year: 2024 text: September 2024
PublicationDecade	2020
PublicationTitle	Pattern recognition
PublicationYear	2024
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, Pyramid Vision Transformer: A versatile backbone for dense prediction without convolutions, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. R. Hou, H. Chang, B. Ma, S. Shan, X. Chen, Cross attention network for few-shot classification, in: Proceedings of the Conference on Neural Information Processing Systems, 2019. L. Chang, Z. Yujie, Z. Andrew, X. Weidi, CounTR: Transformer-based Generalised Visual Counting, in: Proceedings of the British Machine Vision Conference, 2022. Ma, Dai, Jia, Sun, Tan, Liu (b9) 2023; 141 N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: Proceedings of the European Conference on Computer Vision, 2020. Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, N. Sebe, Reverse perspective network for perspective-aware object counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., Matching networks for one shot learning, in: Proceedings of the Conference on Neural Information Processing Systems, 2016. T.N. Mundhenk, G. Konjevod, W.A. Sakla, K. Boakye, A large contextual dataset for classification, detection and counting of cars with deep learning, in: Proceedings of the European Conference on Computer Vision, 2016. Q. Fan, W. Zhuo, C.-K. Tang, Y.-W. Tai, Few-shot object detection with attention-RPN and multi-relation detector, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. H. Lin, X. Hong, Z. Ma, X. Wei, Y. Qiu, Y. Wang, Y. Gong, Direct measure matching for crowd counting, in: Proceedings of the International Joint Conferences on Artificial Intelligence, 2021. H.-T. Nguyen, C.-W. Ngo, Terrace-based food counting and segmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. M.-R. Hsieh, Y.-L. Lin, W.H. Hsu, Drone-based object counting by spatially regularized regional proposal network, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in: Proceedings of the European Conference on Computer Vision, 2014. Falk, Mai, Bensch, Çiçek, Abdulkadir, Marrakchi, Böhm, Deubner, Jäckel, Seiwald (b14) 2019; 16 V. Lempitsky, A. Zisserman, Learning to count objects in images, in: Proceedings of the Conference on Neural Information Processing Systems, 2010. T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. Rodriguez-Vazquez, Alvarez-Fernandez, Molina, Campoy (b15) 2022; 145 Tian, Zhao, Shu, Yang, Li, Jia (b21) 2020; 44 X. Gu, T.-Y. Lin, W. Kuo, Y. Cui, Open-vocabulary object detection via vision and language knowledge distillation, in: Proceedings of the International Conference on Learning Representations, 2022. Y. Zhang, D. Zhou, S. Chen, S. Gao, Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the Conference on Neural Information Processing Systems, 2017. M. Shi, H. Lu, C. Feng, C. Liu, Z. Cao, Represent, compare, and learn: A similarity-aware framework for class-agnostic counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9529–9538. J. Xu, H. Le, V. Nguyen, V. Ranjan, D. Samaras, Zero-shot Object Counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, in: Proceedings of the International Conference on Learning Representations, 2021. H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017. B.-B. Gao, X. Chen, Z. Huang, C. Nie, J. Liu, J. Lai, G. Jiang, X. Wang, C. Wang, Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation, in: Proceedings of the Conference on Neural Information Processing Systems, 35, 2022, pp. 18640–18652. T. Nguyen, C. Pham, K. Nguyen, M. Hoai, Few-shot object counting and detection, in: Proceedings of the European Conference on Computer Vision, 2022, pp. 348–365. Z. You, K. Yang, W. Luo, X. Lu, L. Cui, X. Le, Few-shot object counting with similarity-aware feature enhancement, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6315–6324. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in: Proceedings of the Conference on Neural Information Processing Systems, 2015. Delussu, Putzu, Fumera (b5) 2022; 124 Setti, Conigliaro, Tobanelli, Cristani (b16) 2017; 28 L. Qiao, Y. Zhao, Z. Li, X. Qiu, J. Wu, C. Zhang, DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. V. Ranjan, U. Sharma, T. Nguyen, M. Hoai, Learning to count everything, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. Zhang, Xu, Luo, Cao, Zhen (b7) 2022; 32 Nguyen, Ngo, Chan (b13) 2022; 124 K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. V. Ranjan, H. Le, M. Hoai, Iterative crowd counting, in: Proceedings of the European Conference on Computer Vision, 2018. Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark (b39) 2021 J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, T. Darrell, Few-shot object detection via feature reweighting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in: Proceedings of the International Conference on Machine Learning, 2017. Chen, Yang, Zhang, Zhang, Chen, Du (b6) 2022; 148 E. Lu, W. Xie, A. Zisserman, Class-agnostic counting, in: Proceedings of Asian Conference on Computer Vision, 2018. W. Lin, K. Yang, X. Ma, J. Gao, L. Liu, S. Liu, J. Hou, S. Yi, A.B. Chan, Scale-Prior Deformable Convolution for Exemplar-Guided Class-Agnostic Counting, in: Proceedings of the British Machine Vision Conference, 2022. Wang, Zhou, Cai, Gong (b8) 2023 S.-D. Yang, H.-T. Su, W.H. Hsu, W.-C. Chen, Class-agnostic few-shot object counting, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. M. Xu, Z. Zhang, F. Wei, Y. Lin, Y. Cao, H. Hu, X. Bai, A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model, in: Proceedings of the European Conference on Computer Vision, 2022. Wang (10.1016/j.patcog.2024.110556_b8) 2023 10.1016/j.patcog.2024.110556_b45 10.1016/j.patcog.2024.110556_b46 10.1016/j.patcog.2024.110556_b43 10.1016/j.patcog.2024.110556_b44 10.1016/j.patcog.2024.110556_b49 10.1016/j.patcog.2024.110556_b47 10.1016/j.patcog.2024.110556_b48 Setti (10.1016/j.patcog.2024.110556_b16) 2017; 28 Delussu (10.1016/j.patcog.2024.110556_b5) 2022; 124 10.1016/j.patcog.2024.110556_b12 10.1016/j.patcog.2024.110556_b10 10.1016/j.patcog.2024.110556_b11 10.1016/j.patcog.2024.110556_b17 10.1016/j.patcog.2024.110556_b18 10.1016/j.patcog.2024.110556_b19 Rodriguez-Vazquez (10.1016/j.patcog.2024.110556_b15) 2022; 145 Ma (10.1016/j.patcog.2024.110556_b9) 2023; 141 Falk (10.1016/j.patcog.2024.110556_b14) 2019; 16 10.1016/j.patcog.2024.110556_b20 10.1016/j.patcog.2024.110556_b3 10.1016/j.patcog.2024.110556_b23 10.1016/j.patcog.2024.110556_b4 10.1016/j.patcog.2024.110556_b24 10.1016/j.patcog.2024.110556_b1 10.1016/j.patcog.2024.110556_b2 10.1016/j.patcog.2024.110556_b22 10.1016/j.patcog.2024.110556_b27 10.1016/j.patcog.2024.110556_b28 10.1016/j.patcog.2024.110556_b25 10.1016/j.patcog.2024.110556_b26 10.1016/j.patcog.2024.110556_b29 Nguyen (10.1016/j.patcog.2024.110556_b13) 2022; 124 Radford (10.1016/j.patcog.2024.110556_b39) 2021 10.1016/j.patcog.2024.110556_b30 10.1016/j.patcog.2024.110556_b31 Tian (10.1016/j.patcog.2024.110556_b21) 2020; 44 10.1016/j.patcog.2024.110556_b34 10.1016/j.patcog.2024.110556_b35 10.1016/j.patcog.2024.110556_b32 10.1016/j.patcog.2024.110556_b33 10.1016/j.patcog.2024.110556_b38 Zhang (10.1016/j.patcog.2024.110556_b7) 2022; 32 10.1016/j.patcog.2024.110556_b36 10.1016/j.patcog.2024.110556_b37 10.1016/j.patcog.2024.110556_b41 Chen (10.1016/j.patcog.2024.110556_b6) 2022; 148 10.1016/j.patcog.2024.110556_b42 10.1016/j.patcog.2024.110556_b40
References_xml	– reference: N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: Proceedings of the European Conference on Computer Vision, 2020. – reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. – reference: S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in: Proceedings of the Conference on Neural Information Processing Systems, 2015. – reference: T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. – reference: Y. Zhang, D. Zhou, S. Chen, S. Gao, Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. – reference: R. Hou, H. Chang, B. Ma, S. Shan, X. Chen, Cross attention network for few-shot classification, in: Proceedings of the Conference on Neural Information Processing Systems, 2019. – reference: S.-D. Yang, H.-T. Su, W.H. Hsu, W.-C. Chen, Class-agnostic few-shot object counting, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021. – reference: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the Conference on Neural Information Processing Systems, 2017. – reference: H.-T. Nguyen, C.-W. Ngo, Terrace-based food counting and segmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. – reference: W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, Pyramid Vision Transformer: A versatile backbone for dense prediction without convolutions, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. – reference: C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in: Proceedings of the International Conference on Machine Learning, 2017. – reference: W. Lin, K. Yang, X. Ma, J. Gao, L. Liu, S. Liu, J. Hou, S. Yi, A.B. Chan, Scale-Prior Deformable Convolution for Exemplar-Guided Class-Agnostic Counting, in: Proceedings of the British Machine Vision Conference, 2022. – volume: 16 start-page: 67 year: 2019 end-page: 70 ident: b14 article-title: U-net: deep learning for cell counting, detection, and morphometry publication-title: Nat. Methods – reference: T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in: Proceedings of the European Conference on Computer Vision, 2014. – reference: M. Shi, H. Lu, C. Feng, C. Liu, Z. Cao, Represent, compare, and learn: A similarity-aware framework for class-agnostic counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9529–9538. – reference: Z. You, K. Yang, W. Luo, X. Lu, L. Cui, X. Le, Few-shot object counting with similarity-aware feature enhancement, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6315–6324. – reference: V. Lempitsky, A. Zisserman, Learning to count objects in images, in: Proceedings of the Conference on Neural Information Processing Systems, 2010. – reference: V. Ranjan, U. Sharma, T. Nguyen, M. Hoai, Learning to count everything, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. – volume: 44 start-page: 1050 year: 2020 end-page: 1065 ident: b21 article-title: Prior guided feature enrichment network for few-shot segmentation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 124 year: 2022 ident: b5 article-title: Scene-specific crowd counting using synthetic training images publication-title: Pattern Recognit. – reference: O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., Matching networks for one shot learning, in: Proceedings of the Conference on Neural Information Processing Systems, 2016. – reference: Q. Fan, W. Zhuo, C.-K. Tang, Y.-W. Tai, Few-shot object detection with attention-RPN and multi-relation detector, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. – reference: L. Qiao, Y. Zhao, Z. Li, X. Qiu, J. Wu, C. Zhang, DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. – reference: K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. – start-page: 8748 year: 2021 end-page: 8763 ident: b39 article-title: Learning transferable visual models from natural language supervision publication-title: Proceedings of the International Conference on Machine Learning – year: 2023 ident: b8 article-title: Crowdmlp: Weakly-supervised crowd counting via multi-granularity mlp publication-title: Pattern Recognit. – reference: A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, in: Proceedings of the International Conference on Learning Representations, 2021. – reference: H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017. – volume: 28 start-page: 1798 year: 2017 end-page: 1806 ident: b16 article-title: Count on me: learning to count on a single image publication-title: IEEE Trans. Circuits Syst. Video Technol. – reference: X. Gu, T.-Y. Lin, W. Kuo, Y. Cui, Open-vocabulary object detection via vision and language knowledge distillation, in: Proceedings of the International Conference on Learning Representations, 2022. – reference: T. Nguyen, C. Pham, K. Nguyen, M. Hoai, Few-shot object counting and detection, in: Proceedings of the European Conference on Computer Vision, 2022, pp. 348–365. – volume: 32 start-page: 6686 year: 2022 end-page: 6699 ident: b7 article-title: Cross-domain attention network for unsupervised domain adaptation crowd counting publication-title: IEEE Trans. Circuits Syst. Video Technol. – reference: L. Chang, Z. Yujie, Z. Andrew, X. Weidi, CounTR: Transformer-based Generalised Visual Counting, in: Proceedings of the British Machine Vision Conference, 2022. – volume: 148 start-page: 219 year: 2022 end-page: 231 ident: b6 article-title: Region-aware network: Model human’s top-down visual perception mechanism for crowd counting publication-title: Neural Netw. – reference: V. Ranjan, H. Le, M. Hoai, Iterative crowd counting, in: Proceedings of the European Conference on Computer Vision, 2018. – reference: J. Xu, H. Le, V. Nguyen, V. Ranjan, D. Samaras, Zero-shot Object Counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. – reference: J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016. – reference: Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, N. Sebe, Reverse perspective network for perspective-aware object counting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. – volume: 145 start-page: 155 year: 2022 end-page: 163 ident: b15 article-title: Zenithal isotropic object counting by localization using adversarial training publication-title: Neural Netw. – reference: B.-B. Gao, X. Chen, Z. Huang, C. Nie, J. Liu, J. Lai, G. Jiang, X. Wang, C. Wang, Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation, in: Proceedings of the Conference on Neural Information Processing Systems, 35, 2022, pp. 18640–18652. – reference: M. Xu, Z. Zhang, F. Wei, Y. Lin, Y. Cao, H. Hu, X. Bai, A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model, in: Proceedings of the European Conference on Computer Vision, 2022. – reference: T.N. Mundhenk, G. Konjevod, W.A. Sakla, K. Boakye, A large contextual dataset for classification, detection and counting of cars with deep learning, in: Proceedings of the European Conference on Computer Vision, 2016. – volume: 124 year: 2022 ident: b13 article-title: SibNet: Food instance counting and segmentation publication-title: Pattern Recognit. – reference: Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. – reference: B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, T. Darrell, Few-shot object detection via feature reweighting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. – reference: E. Lu, W. Xie, A. Zisserman, Class-agnostic counting, in: Proceedings of Asian Conference on Computer Vision, 2018. – reference: M.-R. Hsieh, Y.-L. Lin, W.H. Hsu, Drone-based object counting by spatially regularized regional proposal network, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017. – volume: 141 year: 2023 ident: b9 article-title: Crowd counting from single images using recursive multi-pathway zooming and foreground enhancement publication-title: Pattern Recognit. – reference: H. Lin, X. Hong, Z. Ma, X. Wei, Y. Qiu, Y. Wang, Y. Gong, Direct measure matching for crowd counting, in: Proceedings of the International Joint Conferences on Artificial Intelligence, 2021. – volume: 124 year: 2022 ident: 10.1016/j.patcog.2024.110556_b13 article-title: SibNet: Food instance counting and segmentation publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2021.108470 – ident: 10.1016/j.patcog.2024.110556_b33 doi: 10.1109/ICCV.2019.00851 – ident: 10.1016/j.patcog.2024.110556_b37 – ident: 10.1016/j.patcog.2024.110556_b32 doi: 10.1109/CVPR42600.2020.00407 – ident: 10.1016/j.patcog.2024.110556_b11 doi: 10.1109/CVPR42600.2020.00443 – ident: 10.1016/j.patcog.2024.110556_b17 doi: 10.1007/978-3-030-20893-6_42 – ident: 10.1016/j.patcog.2024.110556_b19 doi: 10.1109/CVPR46437.2021.00340 – ident: 10.1016/j.patcog.2024.110556_b30 – volume: 32 start-page: 6686 issue: 10 year: 2022 ident: 10.1016/j.patcog.2024.110556_b7 article-title: Cross-domain attention network for unsupervised domain adaptation crowd counting publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2022.3179824 – ident: 10.1016/j.patcog.2024.110556_b22 doi: 10.1109/ICCV48922.2021.00856 – ident: 10.1016/j.patcog.2024.110556_b20 – year: 2023 ident: 10.1016/j.patcog.2024.110556_b8 article-title: Crowdmlp: Weakly-supervised crowd counting via multi-granularity mlp publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2023.109830 – ident: 10.1016/j.patcog.2024.110556_b18 doi: 10.1109/WACV48630.2021.00091 – ident: 10.1016/j.patcog.2024.110556_b44 – volume: 28 start-page: 1798 issue: 8 year: 2017 ident: 10.1016/j.patcog.2024.110556_b16 article-title: Count on me: learning to count on a single image publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2017.2656718 – ident: 10.1016/j.patcog.2024.110556_b1 – ident: 10.1016/j.patcog.2024.110556_b34 – ident: 10.1016/j.patcog.2024.110556_b23 doi: 10.1109/WACV56688.2023.00625 – ident: 10.1016/j.patcog.2024.110556_b26 doi: 10.1007/978-3-030-58452-8_13 – ident: 10.1016/j.patcog.2024.110556_b47 doi: 10.1007/978-3-319-10602-1_48 – start-page: 8748 year: 2021 ident: 10.1016/j.patcog.2024.110556_b39 article-title: Learning transferable visual models from natural language supervision – ident: 10.1016/j.patcog.2024.110556_b49 doi: 10.1007/978-3-319-46487-9_48 – ident: 10.1016/j.patcog.2024.110556_b27 – ident: 10.1016/j.patcog.2024.110556_b2 doi: 10.1109/CVPR.2016.70 – ident: 10.1016/j.patcog.2024.110556_b45 doi: 10.1109/ICCV.2017.324 – ident: 10.1016/j.patcog.2024.110556_b48 doi: 10.1109/CVPR.2016.91 – ident: 10.1016/j.patcog.2024.110556_b40 – ident: 10.1016/j.patcog.2024.110556_b28 doi: 10.1109/ICCV48922.2021.00986 – volume: 148 start-page: 219 year: 2022 ident: 10.1016/j.patcog.2024.110556_b6 article-title: Region-aware network: Model human’s top-down visual perception mechanism for crowd counting publication-title: Neural Netw. doi: 10.1016/j.neunet.2022.01.015 – volume: 16 start-page: 67 issue: 1 year: 2019 ident: 10.1016/j.patcog.2024.110556_b14 article-title: U-net: deep learning for cell counting, detection, and morphometry publication-title: Nat. Methods doi: 10.1038/s41592-018-0261-2 – volume: 44 start-page: 1050 issue: 2 year: 2020 ident: 10.1016/j.patcog.2024.110556_b21 article-title: Prior guided feature enrichment network for few-shot segmentation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2020.3013717 – ident: 10.1016/j.patcog.2024.110556_b24 doi: 10.1109/CVPR52688.2022.00931 – ident: 10.1016/j.patcog.2024.110556_b43 doi: 10.1109/CVPR.2016.90 – volume: 145 start-page: 155 year: 2022 ident: 10.1016/j.patcog.2024.110556_b15 article-title: Zenithal isotropic object counting by localization using adversarial training publication-title: Neural Netw. doi: 10.1016/j.neunet.2021.10.010 – ident: 10.1016/j.patcog.2024.110556_b12 doi: 10.1609/aaai.v35i3.16337 – volume: 124 year: 2022 ident: 10.1016/j.patcog.2024.110556_b5 article-title: Scene-specific crowd counting using synthetic training images publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2021.108484 – ident: 10.1016/j.patcog.2024.110556_b38 doi: 10.1109/CVPR52729.2023.01492 – ident: 10.1016/j.patcog.2024.110556_b35 doi: 10.1007/978-3-031-20044-1_20 – ident: 10.1016/j.patcog.2024.110556_b4 doi: 10.24963/ijcai.2021/116 – ident: 10.1016/j.patcog.2024.110556_b10 doi: 10.1109/ICCV.2017.446 – ident: 10.1016/j.patcog.2024.110556_b36 – ident: 10.1016/j.patcog.2024.110556_b46 doi: 10.1109/ICCV.2017.322 – ident: 10.1016/j.patcog.2024.110556_b42 doi: 10.1109/CVPR.2017.660 – ident: 10.1016/j.patcog.2024.110556_b3 doi: 10.1007/978-3-030-01234-2_17 – ident: 10.1016/j.patcog.2024.110556_b31 – ident: 10.1016/j.patcog.2024.110556_b25 – ident: 10.1016/j.patcog.2024.110556_b29 doi: 10.1109/ICCV48922.2021.00061 – ident: 10.1016/j.patcog.2024.110556_b41 doi: 10.1007/978-3-031-19818-2_42 – volume: 141 year: 2023 ident: 10.1016/j.patcog.2024.110556_b9 article-title: Crowd counting from single images using recursive multi-pathway zooming and foreground enhancement publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2023.109585
SSID	ssj0017142
Score	2.4757323
Snippet	Counting everything, also named few-shot counting, requires a model to be able to count objects with any novel (unseen) category giving few exemplar boxes....
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	110556
SubjectTerms	Counting everything Few-shot counting Local dependency Vision transformer
Title	CSTrans: Correlation-guided Self-Activation Transformer for Counting Everything
URI	https://dx.doi.org/10.1016/j.patcog.2024.110556
Volume	153
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF5KvXjxLdZH2YPXtUl2k228ldBSFeuhLfQWNvsoldIWSQ9e_O3uJLtFQRQ8hYSZECaTeYRv5kPoNjWBEYKnhAspCYsLRkQkFOE210UmDjmvBmmfR8lwyh5n8ayBMj8LA7BKF_vrmF5Fa3el46zZ2SwWMOMLawcD2CgHngoDv4xx8PK7jx3MA_i9643hNCQg7cfnKozXxoa79dx2iREDPHwMNNY_pacvKWdwhA5crYh79eMco4ZenaBDz8OA3Wd5il6ycZVy7nEGXBs1uo3MtwulFR7rpSE96VnM8MRXqvYO9oAzRxaB-9ap30v4IXWGpoP-JBsSR5RApK34SxIWts9QPOGFsQ1MQHma6q5KoTqC8oCKJO2Gxr6tmItYG5kw22VIqSKtrLjh9Bw1V-uVvkCYRiJMEsOKkEqmIlFQmcYpU2ESQOo3LUS9fXLptogDmcUy93Cx17y2ag5WzWurthDZaW3qLRp_yHNv-vybN-Q20P-qeflvzSu0D2c1fuwaNcu3rb6xBUdZtCuPaqO93sPTcPQJBtTUbA
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB58HPTiW6zPPehxtXluI3iQqrTWx8EK3uJmH6VS2qIt4sU_5R90JtmIgigIngLJbshOJjPfhm_mA9hNbNVKKRIupFI8jLKQS19qLjDX-TbyhMgLaS-v4sZteH4X3U3AW1kLQ7RKF_uLmJ5Ha3fmwFnzYNjtUo0vtR2sUkc58tSaY1a2zMsz7tuejpon-JL3fP_stF1vcCctwBVi5BH3MkTmWsQiswj5q4FIElPTCeEJSqiBjJOaZ3F9kZCRsSoOEZcrpX2jcbgVAd53EqZxdI1kE_ZfP3glJChetCgPPE6PV9br5aSyIcbXQQe3pX5IBPyIdLO_y4efctzZAsw5cMqOi_UvwoTpL8F8KfzAXBxYhuv6TZ7jDlmdxD0KOh3vjLvaaHZjepYfq1I2jbVLaIx3wAOrO3UKdopf0cuI_oCtwO2_mG8VpvqDvlkDFvjSi2MbZl6gQu3LLFBJlITai6uENWwFgtI-qXJty0k9o5eW_LSHtLBqSlZNC6tWgH_MGhZtO34ZL0rTp1_cL8XM8uPM9T_P3IGZRvvyIr1oXrU2YJauFOS1TZgaPY7NFqKdUbadexeD-_9253deSg8Z
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CSTrans%3A+Correlation-guided+Self-Activation+Transformer+for+Counting+Everything&rft.jtitle=Pattern+recognition&rft.au=Gao%2C+Bin-Bin&rft.au=Huang%2C+Zhongyi&rft.date=2024-09-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.eissn=1873-5142&rft.volume=153&rft_id=info:doi/10.1016%2Fj.patcog.2024.110556&rft.externalDocID=S0031320324003078
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon