Replacing softmax with ReLU in Vision Transformers

Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transform...

Full description

Saved in:
Bibliographic Details
Main Authors Wortsman, Mitchell, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon
Format Journal Article
LanguageEnglish
Published 15.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.
AbstractList Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.
Author Lee, Jaehoon
Gilmer, Justin
Wortsman, Mitchell
Kornblith, Simon
Author_xml – sequence: 1
  givenname: Mitchell
  surname: Wortsman
  fullname: Wortsman, Mitchell
– sequence: 2
  givenname: Jaehoon
  surname: Lee
  fullname: Lee, Jaehoon
– sequence: 3
  givenname: Justin
  surname: Gilmer
  fullname: Gilmer, Justin
– sequence: 4
  givenname: Simon
  surname: Kornblith
  fullname: Kornblith, Simon
BackLink https://doi.org/10.48550/arXiv.2309.08586$$DView paper in arXiv
BookMark eNotzstOwzAQhWEvYAGFB-gKv0CCPb4kXqKKmxQJqQrdRlN7ApYap7IrKG8PFFZn9R99l-wszYkYW0pR69YYcYv5GD9qUMLVojWtvWCwpv0OfUxvvMzjYcIj_4yHd76m7pXHxDexxDnxPmMq45wnyuWKnY-4K3T9vwvWP9z3q6eqe3l8Xt11FdrGVjSiAxcEAKARpC2BE941Wsmtl8E30oXQeimkMtpoaUkjOTAgf-ptE9SC3fzdntDDPscJ89fwix9OePUNEn4_hA
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2309.08586
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2309_08586
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a676-efa929d0222a50e46e290c97431bc1dc719dd8c1013545416e4ae92521676b7d3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:40:32 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a676-efa929d0222a50e46e290c97431bc1dc719dd8c1013545416e4ae92521676b7d3
OpenAccessLink https://arxiv.org/abs/2309.08586
ParticipantIDs arxiv_primary_2309_08586
PublicationCentury 2000
PublicationDate 2023-09-15
PublicationDateYYYYMMDD 2023-09-15
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-09-15
  day: 15
PublicationDecade 2020
PublicationYear 2023
Score 1.895523
SecondaryResourceType preprint
Snippet Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
Title Replacing softmax with ReLU in Vision Transformers
URI https://arxiv.org/abs/2309.08586
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ1LSwMxEMeHticvoqjUJzl4je5md7Obo4i1iA-QVnormcwsFLRIW6Uf38lui1685nGYJOT_myQzAbh0xlsMlGjv0eicU6vR1aiJEROLlJZJDHB-erbDcf4wKSYdUNtYGL9Yz77b_MC4vBY-dlcCBZXtQteY-GTr_mXSXk42qbg27X_bCWM2RX9EYrAHuxu6UzftdOxDh-cHYARy330QjVBL2fQ-_FrF00_1yo9jNZurtya8W422CClAdgijwd3odqg3XxVob0urufaCGRSdJ18knFs2LgkuqjOGlEKZOqIqyPLPhFiEgTj37IxIp_TGkrIj6Im3z31QVFAenC-xLk1ua1MxVj4rmDIiFBw7hn5j4PSzzUYxjbZPG9tP_q86hZ34T3p86JAWZ9BbLb74XNR0hRfNkP4AVp51GQ
link.rule.ids 228,230,780,885
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Replacing+softmax+with+ReLU+in+Vision+Transformers&rft.au=Wortsman%2C+Mitchell&rft.au=Lee%2C+Jaehoon&rft.au=Gilmer%2C+Justin&rft.au=Kornblith%2C+Simon&rft.date=2023-09-15&rft_id=info:doi/10.48550%2Farxiv.2309.08586&rft.externalDocID=2309_08586