Part-guided Relational Transformers for Fine-grained Visual Recognition
Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to s...
Saved in:
Published in | IEEE transactions on image processing Vol. 30; p. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing. The part discovery module efficiently discovers the discriminative regions which are highly-corresponded to the gradient descent procedure. Then the second feature transformation module builds correlations within the global embedding and multiple part embedding, enhancing spatial interactions among semantic pixels. Moreover, our proposed approach does not rely on additional part branches in the inference time and reaches state-of-the-art performance on 3 widely-used fine-grained object recognition benchmarks. Experimental results and explainable visualizations demonstrate the effectiveness of our proposed approach. |
---|---|
AbstractList | Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to solve this issue in one unified framework from two aspects, i.e. , constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing. The part discovery module efficiently discovers the discriminative regions which are highly-corresponded to the gradient descent procedure. Then the second feature transformation module builds correlations within the global embedding and multiple part embedding, enhancing spatial interactions among semantic pixels. Moreover, our proposed approach does not rely on additional part branches in the inference time and reaches state-of-the-art performance on 3 widely-used fine-grained object recognition benchmarks. Experimental results and explainable visualizations demonstrate the effectiveness of our proposed approach. |
Author | Zhao, Yifan Li, Jia Chen, Xiaowu Tian, Yonghong |
Author_xml | – sequence: 1 givenname: Yifan surname: Zhao fullname: Zhao, Yifan – sequence: 2 givenname: Jia surname: Li fullname: Li, Jia – sequence: 3 givenname: Xiaowu surname: Chen fullname: Chen, Xiaowu – sequence: 4 givenname: Yonghong surname: Tian fullname: Tian, Yonghong |
BookMark | eNpdkMFLwzAUh4NMcJveBS8FL146X5KmbY4y3BwMHGN6DWn6OjK6dibtwf_e1A0PHh6_d_h-D943IaOmbZCQewozSkE-71abGQNGZ5yyNJFwRcZUJjQGSNgo7CCyOKOJvCET7w8ANBE0HZPlRrsu3ve2xDLaYq072za6jnZON75q3RGdj0JGC9tgvHc6RBl9Wt8HaIum3Td2qNyS60rXHu8uOSUfi9fd_C1evy9X85d1bLiELha5EDxBYFIbpJXBrDQZZjIpZCo4oDF5UWpechBMFkLoUtJMC5liWeQMGZ-Sp_Pdk2u_evSdOlpvsK51g23vFRMyh0xwOaCP_9BD27vw3C8lGYTJAgVnyrjWe4eVOjl71O5bUVCDWRXMqsGsupgNlYdzxSLiHy7TYDfP-Q_4n3W3 |
CODEN | IIPRE4 |
CitedBy_id | crossref_primary_10_1007_s11042_023_17786_5 crossref_primary_10_1016_j_patcog_2023_109882 crossref_primary_10_23919_transcom_2023EBP3102 crossref_primary_10_1016_j_eja_2023_126884 crossref_primary_10_3390_app13031640 crossref_primary_10_1016_j_media_2023_102931 crossref_primary_10_1016_j_neucom_2023_02_053 crossref_primary_10_1016_j_engappai_2024_108248 crossref_primary_10_1016_j_imavis_2024_104923 crossref_primary_10_1007_s11263_023_01873_z crossref_primary_10_1007_s11633_022_1404_6 crossref_primary_10_3390_electronics12122635 crossref_primary_10_1007_s11042_023_15991_w crossref_primary_10_1016_j_neunet_2023_01_050 crossref_primary_10_1109_TMM_2023_3244340 |
Cites_doi | 10.1109/CVPR.2016.319 10.1109/CVPR42600.2020.00977 10.1109/CVPR.2017.688 10.1109/CVPR.2009.5206848 10.1109/CVPR.2018.00813 10.1109/ICCVW.2013.77 10.1016/j.patcog.2017.10.002 10.1109/CVPR.2006.68 10.1007/978-3-030-01270-0_49 10.1007/978-3-030-58565-5_10 10.1109/CVPR.2016.41 10.1109/TIP.2017.2774041 10.1007/978-3-030-58452-8_13 10.1007/s11263-017-1048-0 10.1007/978-3-319-10590-1_54 10.1109/CVPR.2016.90 10.1109/CVPR.2015.7299194 10.1109/CVPR.2015.7298658 10.1109/CVPR.2018.00436 10.1109/TIP.2017.2688133 10.1609/aaai.v34i07.6712 10.1609/aaai.v34i07.7016 10.1109/CVPR.2017.476 10.1109/ICCV.2019.00969 10.1109/ICCV.2019.00670 10.1109/ICCV.2017.229 10.1109/ICCV.2019.00833 10.1007/978-3-030-01270-0_35 10.1109/ICCV.2015.136 10.1109/CVPR46437.2021.01625 10.1109/ICCV.2017.557 10.1109/T-C.1971.223290 10.1109/TIP.2020.2973812 10.1109/CVPR46437.2021.01483 10.1109/CVPR.2018.00117 10.1109/CVPR42600.2020.00869 10.1109/CVPR.2011.5995368 10.1007/978-3-030-01240-3_4 10.1109/CVPR.2019.00315 10.1109/CVPR.2019.00515 10.1109/ICCV.2019.00842 10.1109/CVPR42600.2020.01048 10.1109/ICCV.2017.74 10.1109/CVPR.2016.132 10.1109/ICCV.2015.170 10.1007/978-3-030-01219-9_22 10.1109/CVPR.2019.00412 10.1007/978-3-030-01258-8_5 10.18653/v1/P19-1285 10.1109/CVPR.2017.743 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TIP.2021.3126490 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore (Online service) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Engineering |
EISSN | 1941-0042 |
EndPage | 1 |
ExternalDocumentID | 10_1109_TIP_2021_3126490 9614988 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61825101; 61922006 funderid: 10.13039/501100001809 |
GroupedDBID | --- -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS F5P HZ~ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RIG RNS TAE TN5 53G 5VS AAYOK AAYXX ABFSI AETIX AGSQL AI. AIBXA ALLEH CITATION E.L EJD H~9 ICLAB IFJZH VH1 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c390t-585534e029ace1fce7dc7e794b96530ecc8bda3d30529b55ad917a596edb82e23 |
IEDL.DBID | RIE |
ISSN | 1057-7149 |
IngestDate | Wed Dec 04 07:51:31 EST 2024 Thu Oct 10 20:08:55 EDT 2024 Fri Dec 06 04:50:00 EST 2024 Wed Jun 26 19:25:31 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c390t-585534e029ace1fce7dc7e794b96530ecc8bda3d30529b55ad917a596edb82e23 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-3976-6500 0000-0002-2978-5935 0000-0002-4346-8696 0000-0002-5691-013X |
PQID | 2599209927 |
PQPubID | 85429 |
PageCount | 1 |
ParticipantIDs | ieee_primary_9614988 crossref_primary_10_1109_TIP_2021_3126490 proquest_miscellaneous_2598075392 proquest_journals_2599209927 |
PublicationCentury | 2000 |
PublicationDate | 2021-01-01 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – month: 01 year: 2021 text: 2021-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on image processing |
PublicationTitleAbbrev | TIP |
PublicationYear | 2021 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | he (ref61) 2021 ref13 ref12 ref59 ref15 ref14 ref52 ref54 ref10 ref17 ref16 ref19 ref18 abel (ref34) 2018; 126 yang (ref23) 2018 ref51 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 liu (ref57) 2021 ref49 ref8 yuan (ref58) 2021 ref7 ref9 ref3 xiao (ref35) 2015 ref6 ref5 ref40 ref37 zhu (ref55) 2020 ref36 ref31 yao (ref27) 2012 ref33 ref32 vaswani (ref24) 2017 ref2 ref39 ref38 choromanski (ref53) 2020 maji (ref4) 2013 liu (ref66) 2016 xie (ref60) 2021 wah (ref1) 2011 ref68 ref67 ref26 ref25 ref64 ref20 ref63 ref22 ref21 ref28 dubey (ref11) 2018 ref29 dosovitskiy (ref56) 2020 he (ref30) 2017 chen (ref50) 2019 ref62 cordonnier (ref65) 2020 |
References_xml | – year: 2020 ident: ref65 article-title: On the relationship between self-attention and convolutional layers publication-title: Proc Int Conf Learn Represent (ICLR) contributor: fullname: cordonnier – ident: ref25 doi: 10.1109/CVPR.2016.319 – ident: ref39 doi: 10.1109/CVPR42600.2020.00977 – start-page: 4075 year: 2017 ident: ref30 article-title: Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification publication-title: Proc 31st AAAI Conf Artif Intell contributor: fullname: he – year: 2020 ident: ref56 article-title: An image is worth $16\times16$ words: Transformers for image recognition at scale publication-title: Proc Int Conf Learn Represent (ICLR) contributor: fullname: dosovitskiy – year: 2020 ident: ref55 article-title: Deformable DETR: Deformable transformers for end-to-end object detection publication-title: arXiv 2010 04159 contributor: fullname: zhu – year: 2021 ident: ref57 article-title: Swin transformer: Hierarchical vision transformer using shifted Windows publication-title: arXiv 2103 14030 contributor: fullname: liu – ident: ref37 doi: 10.1109/CVPR.2017.688 – ident: ref63 doi: 10.1109/CVPR.2009.5206848 – ident: ref44 doi: 10.1109/CVPR.2018.00813 – ident: ref3 doi: 10.1109/ICCVW.2013.77 – ident: ref31 doi: 10.1016/j.patcog.2017.10.002 – ident: ref28 doi: 10.1109/CVPR.2006.68 – ident: ref10 doi: 10.1007/978-3-030-01270-0_49 – start-page: 637 year: 2018 ident: ref11 article-title: Maximum-entropy fine grained classification publication-title: Proc Adv Neural Inf Process Syst (NIPS) contributor: fullname: dubey – ident: ref43 doi: 10.1007/978-3-030-58565-5_10 – ident: ref12 doi: 10.1109/CVPR.2016.41 – ident: ref41 doi: 10.1109/TIP.2017.2774041 – ident: ref54 doi: 10.1007/978-3-030-58452-8_13 – volume: 126 start-page: 476 year: 2018 ident: ref34 article-title: Do semantic parts emerge in convolutional neural networks? publication-title: Int J Comput Vis doi: 10.1007/s11263-017-1048-0 contributor: fullname: abel – year: 2021 ident: ref61 article-title: TransFG: A transformer architecture for fine-grained recognition publication-title: arXiv 2103 07976 contributor: fullname: he – ident: ref29 doi: 10.1007/978-3-319-10590-1_54 – ident: ref67 doi: 10.1109/CVPR.2016.90 – ident: ref33 doi: 10.1109/CVPR.2015.7299194 – year: 2016 ident: ref66 article-title: Fully convolutional attention networks for fine-grained recognition publication-title: arXiv 1603 06765 contributor: fullname: liu – ident: ref2 doi: 10.1109/CVPR.2015.7298658 – ident: ref6 doi: 10.1109/CVPR.2018.00436 – ident: ref40 doi: 10.1109/TIP.2017.2688133 – year: 2021 ident: ref58 article-title: Tokens-to-token ViT: Training vision transformers from scratch on ImageNet publication-title: arXiv 2101 11986 contributor: fullname: yuan – ident: ref9 doi: 10.1609/aaai.v34i07.6712 – ident: ref51 doi: 10.1609/aaai.v34i07.7016 – ident: ref21 doi: 10.1109/CVPR.2017.476 – ident: ref49 doi: 10.1109/ICCV.2019.00969 – year: 2020 ident: ref53 article-title: Rethinking attention with performers publication-title: Proc Int Conf Learn Represent (ICLR) contributor: fullname: choromanski – ident: ref38 doi: 10.1109/ICCV.2019.00670 – start-page: 420 year: 2018 ident: ref23 article-title: Learning to navigate for fine-grained classification publication-title: Proc Eur Conf Comput Vis (ECCV) contributor: fullname: yang – ident: ref15 doi: 10.1109/ICCV.2017.229 – ident: ref47 doi: 10.1109/ICCV.2019.00833 – start-page: 3466 year: 2012 ident: ref27 article-title: A codebook-free and annotation-free approach for fine-grained image categorization publication-title: Proc IEEE Conf Comput Vis Pattern Recognit contributor: fullname: yao – ident: ref13 doi: 10.1007/978-3-030-01270-0_35 – ident: ref36 doi: 10.1109/ICCV.2015.136 – ident: ref59 doi: 10.1109/CVPR46437.2021.01625 – ident: ref46 doi: 10.1109/ICCV.2017.557 – ident: ref62 doi: 10.1109/T-C.1971.223290 – start-page: 5998 year: 2017 ident: ref24 article-title: Attention is all you need publication-title: Proc Adv Neural Inf Process Syst contributor: fullname: vaswani – ident: ref45 doi: 10.1109/TIP.2020.2973812 – start-page: 5157 year: 2019 ident: ref50 article-title: Destruction and construction learning for fine-grained image recognition publication-title: Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR) contributor: fullname: chen – ident: ref42 doi: 10.1109/CVPR46437.2021.01483 – ident: ref19 doi: 10.1109/CVPR.2018.00117 – year: 2021 ident: ref60 article-title: SegFormer: Simple and efficient design for semantic segmentation with transformers publication-title: arXiv 2105 15203 contributor: fullname: xie – ident: ref20 doi: 10.1109/CVPR42600.2020.00869 – ident: ref26 doi: 10.1109/CVPR.2011.5995368 – ident: ref22 doi: 10.1007/978-3-030-01240-3_4 – ident: ref18 doi: 10.1109/CVPR.2019.00315 – ident: ref8 doi: 10.1109/CVPR.2019.00515 – ident: ref7 doi: 10.1109/ICCV.2019.00842 – year: 2011 ident: ref1 article-title: The caltech-UCSD birds-200-2011 dataset contributor: fullname: wah – year: 2013 ident: ref4 article-title: Fine-grained visual classification of aircraft publication-title: arXiv 1306 5151 contributor: fullname: maji – ident: ref48 doi: 10.1109/CVPR42600.2020.01048 – ident: ref68 doi: 10.1109/ICCV.2017.74 – ident: ref17 doi: 10.1109/CVPR.2016.132 – ident: ref5 doi: 10.1109/ICCV.2015.170 – ident: ref16 doi: 10.1007/978-3-030-01219-9_22 – ident: ref32 doi: 10.1109/CVPR.2019.00412 – ident: ref52 doi: 10.1007/978-3-030-01258-8_5 – start-page: 842 year: 2015 ident: ref35 article-title: The application of two-level attention models in deep convolutional neural network for fine-grained image classification publication-title: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR) contributor: fullname: xiao – ident: ref64 doi: 10.18653/v1/P19-1285 – ident: ref14 doi: 10.1109/CVPR.2017.743 |
SSID | ssj0014516 |
Score | 2.519725 |
Snippet | Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 1 |
SubjectTerms | Correlation Costs Embedding Feature extraction Fine-grained visual recognition Modules Natural language processing Object recognition Part discovery Relationship Semantics Task analysis Transformers Visualization |
Title | Part-guided Relational Transformers for Fine-grained Visual Recognition |
URI | https://ieeexplore.ieee.org/document/9614988 https://www.proquest.com/docview/2599209927 https://search.proquest.com/docview/2598075392 |
Volume | 30 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR3LSsNAcGh70oPVVjFaJYIXwW3TZJPNHkWsVagUaaW3kH2kFCEV21z8emfzoqgHTwlkdrPszOzM7LwArpM4HgpJExLwUBCqTRAA9SQRHlI0DRLFpElOnrwE4zl9XviLBtzWuTBa6zz4TPfNa-7LV2uZmauyAUdZwsOwCU3GWZGrVXsMTMPZ3LPpM8IQrHJJOnwwe5qiIegO0T5F8W9O3x0RlPdU-XUQ59Jl1IZJta4iqOS9n21FX379KNn434UfwkGpZtp3BV0cQUOnHWiXKqddMvSmA_s79Qi78DhFQiLLbKUQqAqTw2lmlXaLuqKNT3uEY8jSdJdAwLfVJkOg1yoUaZ0ew3z0MLsfk7LTApEed7YEbQbfo9pxeSz1MJGaKck0sqrgge85iOZQqNhTnvELCt-PFVp5sc8DrUToatc7gVa6TvUp2EokTNBQDI1LH5WVUFGZ8FgFTkhVEHALbqrNjz6KghpRbog4PEJERQZRUYkoC7pmL2u4chst6FXYikqO20RoxnGTBuwyC67qz8grxgESp3qd5TCm9jKqhGd_z3wOe-b_xRVLD1rbz0xfoNKxFZc5tX0DFWvULQ |
link.rule.ids | 315,781,785,797,27929,27930,54763 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4gHtSDKGisotbEi4kLpd0-9miMCAqEGDDcmu6jhJgUI_Tir3e2D2LUg6c26XSz2ZnZ_WbnBXAdR1GHCxoTjwWcUKWDAKgjCHdQoqkXS1_o5OThyOtN6dPMnVXgdpMLo5TKgs9US79mvny5FKm-KmszPEtYEGzBtktRL_JsrY3PQLeczXybrk98JCydkhZrT_pjNAXtDlqoCAD0_vvtEMq6qvzairPzpVuDYTmzPKzkrZWueUt8_ija-N-pH8B-ATTNu1wyDqGikjrUCtBpFiq9qsPet4qEDXgcoyiRebqQSFQGyuEwkxLfIlo08Wl28R8y1_0lkPB1sUqR6KUMRlomRzDtPkzue6TotUCEw6w1QavBdaiybBYJ1YmF8qXwFSorZ57rWMjogMvIkY72DHLXjSTaeZHLPCV5YCvbOYZqskzUCZiSxz6nAe9opz7ClUBSEbNIelZApecxA27KxQ_f85IaYWaKWCxERoWaUWHBKAMaei03dMUyGtAsuRUWOrcK0ZBjOhHY9g242nxGbdEukChRyzSj0dWXERSe_j3yJez0JsNBOOiPns9gV88lv3BpQnX9kapzhCBrfpFJ3hdeJ9d7 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Part-Guided+Relational+Transformers+for+Fine-Grained+Visual+Recognition&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhao%2C+Yifan&rft.au=Li%2C+Jia&rft.au=Chen%2C+Xiaowu&rft.au=Tian%2C+Yonghong&rft.date=2021-01-01&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=30&rft.spage=9470&rft.epage=9481&rft_id=info:doi/10.1109%2FTIP.2021.3126490&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2021_3126490 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon |