Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features. By comparing the design principles of the recent convolutional neural networks ConvNets) and Vision Transformers, we pr...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
22.11.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features. By comparing the design principles of the recent convolutional neural networks ConvNets) and Vision Transformers, we propose to simplify the self-attention by leveraging a convolutional modulation operation. We show that such a simple approach can better take advantage of the large kernels (>=7x7) nested in convolutional layers. We build a family of hierarchical ConvNets using the proposed convolutional modulation, termed Conv2Former. Our network is simple and easy to follow. Experiments show that our Conv2Former outperforms existent popular ConvNets and vision Transformers, like Swin Transformer and ConvNeXt in all ImageNet classification, COCO object detection and ADE20k semantic segmentation. |
---|---|
AbstractList | This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features. By comparing the design principles of the recent convolutional neural networks ConvNets) and Vision Transformers, we propose to simplify the self-attention by leveraging a convolutional modulation operation. We show that such a simple approach can better take advantage of the large kernels (>=7x7) nested in convolutional layers. We build a family of hierarchical ConvNets using the proposed convolutional modulation, termed Conv2Former. Our network is simple and easy to follow. Experiments show that our Conv2Former outperforms existent popular ConvNets and vision Transformers, like Swin Transformer and ConvNeXt in all ImageNet classification, COCO object detection and ADE20k semantic segmentation. |
Author | Feng, Jiashi Cheng-Ze Lu Hou, Qibin Ming-Ming, Cheng |
Author_xml | – sequence: 1 givenname: Qibin surname: Hou fullname: Hou, Qibin – sequence: 2 fullname: Cheng-Ze Lu – sequence: 3 givenname: Cheng surname: Ming-Ming fullname: Ming-Ming, Cheng – sequence: 4 givenname: Jiashi surname: Feng fullname: Feng, Jiashi |
BookMark | eNqNysEKgkAUheEhCrLyHQZaC3ZHU9uFJK1apLgViTFG9F6bGYPePoseoNWB7_wrNkdCOWMOCLHz4gBgyVxjWt_3YR9BGAqHZSnhEzLSvdQHfuS56odO8kLXaJqverl9TfLpLtLyCXmpzFh3_CpvdEdlFeGGLZq6M9L97Zpts1ORnr1B02OUxlYtjRqnq4JIJBAHSQjiv-oNz-k8sQ |
ContentType | Paper |
Copyright | 2022. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_27392849523 |
IEDL.DBID | BENPR |
IngestDate | Wed Oct 16 12:53:52 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_27392849523 |
OpenAccessLink | https://www.proquest.com/docview/2739284952?pq-origsite=%requestingapplication% |
PQID | 2739284952 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2739284952 |
PublicationCentury | 2000 |
PublicationDate | 20221122 |
PublicationDateYYYYMMDD | 2022-11-22 |
PublicationDate_xml | – month: 11 year: 2022 text: 20221122 day: 22 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2022 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.4295244 |
SecondaryResourceType | preprint |
Snippet | This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Artificial neural networks Image segmentation Modulation Object recognition Vision |
Title | Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition |
URI | https://www.proquest.com/docview/2739284952 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bS8MwFD64FsE3r3iZI6CvwTW92b2IjtYhrIxtyt5GmiYwkHZru4Ev_nZPSqsPwh5zciMXvvOdk5ME4J7J1OdI5CkXQlCHp33K-6lFmbRQWfLEt1x9wXkce6N3523hLhqHW9mEVbaYWAN1mgvtI3_AmgFCaeCyp_WG6l-j9Olq84VGB0yGlgIzwHwJ48n018vCPB85s_0PaGvtER2DOeFrWZzAgcxO4bAOuhTlGUTDPNuxCFmjLAbkmcxW-qleMm-5pCzorPpCiS4Xy4qgkHysyi3_JNM28CfPzuEuCufDEW07XzYbpFz-Dce-AAMtfXkJxNH0SnGBtMxGSyRJhGMrHighPeWoIL2C7r6Wrvdn38AR07H7Fq4E64JRFVt5ixq1SnrQeYxee83kYWr8Hf4AL8yBrw |
link.rule.ids | 783,787,12779,21402,33387,33758,43614,43819 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60RfTmEx9VF_S62Gw2CfEiUoxR2yA2Sm9hs9mFQklqkgr-e2dDogeh132yD-b7ZnZmB-CaqcwTSOSpkFJSLrIhFcPMokxZCJYi9SzHBDhPIjd8588zZ9Ya3KrWrbKTiY2gzgppbOQ32NNHUeo77G75SU3WKPO62qbQ2IQ-txFoTKR48PhrY2Guh4zZ_idmG-wIdqH_Kpaq3IMNle_DVuNyKasDCEZF_sUC5IyqvCX3ZDo3H_WSuGOSqqTT-htLTLtI1QQLyce8WokFeevcfor8EK6Ch3gU0m7ypL0eVfK3GPsIeqjnq2Mg3JArLSSSMhv1kDSV3NbC11K5mms_O4HBupFO11dfwnYYT8bJ-Cl6OYMdZrz4LTwTNoBeXa7UOWJrnV40G_gDIGKBIw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Conv2Former%3A+A+Simple+Transformer-Style+ConvNet+for+Visual+Recognition&rft.jtitle=arXiv.org&rft.au=Hou%2C+Qibin&rft.au=Cheng-Ze+Lu&rft.au=Ming-Ming%2C+Cheng&rft.au=Feng%2C+Jiashi&rft.date=2022-11-22&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |