Replacing softmax with ReLU in Vision Transformers

Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transform...

Full description

Saved in:

Bibliographic Details
Main Authors	Wortsman, Mitchell, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon
Format	Journal Article
Language	English
Published	15.09.2023
Subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online Access	Get full text

Cover

Loading…

Abstract	Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.
AbstractList	Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.
Author	Lee, Jaehoon Gilmer, Justin Wortsman, Mitchell Kornblith, Simon
Author_xml	– sequence: 1 givenname: Mitchell surname: Wortsman fullname: Wortsman, Mitchell – sequence: 2 givenname: Jaehoon surname: Lee fullname: Lee, Jaehoon – sequence: 3 givenname: Justin surname: Gilmer fullname: Gilmer, Justin – sequence: 4 givenname: Simon surname: Kornblith fullname: Kornblith, Simon
BackLink	https://doi.org/10.48550/arXiv.2309.08586$$DView paper in arXiv
BookMark	eNotzstOwzAQhWEvYAGFB-gKv0CCPb4kXqKKmxQJqQrdRlN7ApYap7IrKG8PFFZn9R99l-wszYkYW0pR69YYcYv5GD9qUMLVojWtvWCwpv0OfUxvvMzjYcIj_4yHd76m7pXHxDexxDnxPmMq45wnyuWKnY-4K3T9vwvWP9z3q6eqe3l8Xt11FdrGVjSiAxcEAKARpC2BE941Wsmtl8E30oXQeimkMtpoaUkjOTAgf-ptE9SC3fzdntDDPscJ89fwix9OePUNEn4_hA
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2309.08586
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2309_08586
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a676-efa929d0222a50e46e290c97431bc1dc719dd8c1013545416e4ae92521676b7d3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:40:32 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a676-efa929d0222a50e46e290c97431bc1dc719dd8c1013545416e4ae92521676b7d3
OpenAccessLink	https://arxiv.org/abs/2309.08586
ParticipantIDs	arxiv_primary_2309_08586
PublicationCentury	2000
PublicationDate	2023-09-15
PublicationDateYYYYMMDD	2023-09-15
PublicationDate_xml	– month: 09 year: 2023 text: 2023-09-15 day: 15
PublicationDecade	2020
PublicationYear	2023
Score	1.895523
SecondaryResourceType	preprint
Snippet	Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Title	Replacing softmax with ReLU in Vision Transformers
URI	https://arxiv.org/abs/2309.08586
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ1LSwMxEMeHticvoqjUJzl4je5md7Obo4i1iA-QVnormcwsFLRIW6Uf38lui1685nGYJOT_myQzAbh0xlsMlGjv0eicU6vR1aiJEROLlJZJDHB-erbDcf4wKSYdUNtYGL9Yz77b_MC4vBY-dlcCBZXtQteY-GTr_mXSXk42qbg27X_bCWM2RX9EYrAHuxu6UzftdOxDh-cHYARy330QjVBL2fQ-_FrF00_1yo9jNZurtya8W422CClAdgijwd3odqg3XxVob0urufaCGRSdJ18knFs2LgkuqjOGlEKZOqIqyPLPhFiEgTj37IxIp_TGkrIj6Im3z31QVFAenC-xLk1ua1MxVj4rmDIiFBw7hn5j4PSzzUYxjbZPG9tP_q86hZ34T3p86JAWZ9BbLb74XNR0hRfNkP4AVp51GQ
link.rule.ids	228,230,780,885
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Replacing+softmax+with+ReLU+in+Vision+Transformers&rft.au=Wortsman%2C+Mitchell&rft.au=Lee%2C+Jaehoon&rft.au=Gilmer%2C+Justin&rft.au=Kornblith%2C+Simon&rft.date=2023-09-15&rft_id=info:doi/10.48550%2Farxiv.2309.08586&rft.externalDocID=2309_08586