SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge de...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13671; pp. 620 - 640
Main Authors	Kong, Zhenglun, Dong, Peiyan, Ma, Xiaolong, Meng, Xin, Niu, Wei, Sun, Mengshu, Shen, Xuan, Yuan, Geng, Ren, Bin, Tang, Hao, Qin, Minghai, Wang, Yanzhi
Format	Book Chapter
Language	English
Published	Switzerland Springer 2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Subjects	FPGA Hardware acceleration Mobile devices Model compression Vision transformer
Online Access	Get full text

Cover

Loading…

Abstract	Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
AbstractList	Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
Author	Meng, Xin Niu, Wei Sun, Mengshu Qin, Minghai Dong, Peiyan Kong, Zhenglun Yuan, Geng Ma, Xiaolong Shen, Xuan Ren, Bin Tang, Hao Wang, Yanzhi
Author_xml	– sequence: 1 givenname: Zhenglun orcidid: 0000-0002-8120-4456 surname: Kong fullname: Kong, Zhenglun – sequence: 2 givenname: Peiyan orcidid: 0000-0001-5287-5149 surname: Dong fullname: Dong, Peiyan – sequence: 3 givenname: Xiaolong orcidid: 0000-0003-1392-2787 surname: Ma fullname: Ma, Xiaolong – sequence: 4 givenname: Xin orcidid: 0000-0003-2228-0587 surname: Meng fullname: Meng, Xin – sequence: 5 givenname: Wei orcidid: 0000-0002-2697-7042 surname: Niu fullname: Niu, Wei – sequence: 6 givenname: Mengshu orcidid: 0000-0003-3540-1464 surname: Sun fullname: Sun, Mengshu – sequence: 7 givenname: Xuan orcidid: 0000-0003-4965-7321 surname: Shen fullname: Shen, Xuan – sequence: 8 givenname: Geng orcidid: 0000-0001-9844-992X surname: Yuan fullname: Yuan, Geng – sequence: 9 givenname: Bin orcidid: 0000-0002-4116-5237 surname: Ren fullname: Ren, Bin – sequence: 10 givenname: Hao orcidid: 0000-0002-2077-1246 surname: Tang fullname: Tang, Hao – sequence: 11 givenname: Minghai orcidid: 0000-0001-5172-5309 surname: Qin fullname: Qin, Minghai – sequence: 12 givenname: Yanzhi orcidid: 0000-0002-3024-7990 surname: Wang fullname: Wang, Yanzhi email: yanz.wang@northeastern.edu
BookMark	eNpFkNtOwzAMhsNRbMAbcJEXCDh1mjTcITQO0iSQNnYbpa0LhZGOpIB4e7oNiStbv_XZ8jdm-6ELxNiZhHMJYC6sKQQKQCkygGLoHJodNsYh2QRml42kllIgKrv3P8jsPhsBQiasUXjIxhJzMFZZLY_YaUqvAJAZlBLUiD3MHhft_JJPgi-XbXjmNz71FPmiTW0X-Dz6kJouvlNM_Kv1fOp7CtWPuPr2kfisa3o-794o8Mf4GQb-hB00fpno9K8es6ebyfz6Tkwfbu-vr6ZilSnsBeaWKmisItK19LrxssKcrC4loi3zShZ1TUVRV0CmNpmqUJdaq9wbX-ZNjscs2-5NqzicpejKrntLToJby3ODPIduMOI2rtxa3gCpLbSK3ccnpd7Rmqoo9NEvqxe_Gl5PzkgEA9rlRjttNP4CrFxvNA
ContentType	Book Chapter
Copyright	The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml	– notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID	FFUUA
DEWEY	006.37
DOI	10.1007/978-3-031-20083-0_37
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	3031200837 9783031200830
EISSN	1611-3349
Editor	Farinella, Giovanni Maria Avidan, Shai Cissé, Moustapha Brostow, Gabriel Hassner, Tal
Editor_xml	– sequence: 1 fullname: Avidan, Shai – sequence: 2 fullname: Cissé, Moustapha – sequence: 3 fullname: Farinella, Giovanni Maria – sequence: 4 fullname: Brostow, Gabriel – sequence: 5 fullname: Hassner, Tal
EndPage	640
ExternalDocumentID	EBC7130706_576_676
GroupedDBID	38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02
ID	FETCH-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53
ISBN	3031200829 9783031200823
ISSN	0302-9743
IngestDate	Tue Jul 29 20:13:56 EDT 2025 Thu May 29 16:41:50 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	TA1634
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53
Notes	Z. Kong and P. Dong—Both authors contributed equally. Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-20083-0_37.
OCLC	1350794961
ORCID	0000-0002-8120-4456 0000-0003-3540-1464 0000-0002-2077-1246 0000-0003-1392-2787 0000-0002-2697-7042 0000-0003-2228-0587 0000-0003-4965-7321 0000-0002-4116-5237 0000-0001-5172-5309 0000-0001-9844-992X 0000-0001-5287-5149 0000-0002-3024-7990
PQID	EBC7130706_576_676
PageCount	21
ParticipantIDs	springer_books_10_1007_978_3_031_20083_0_37 proquest_ebookcentralchapters_7130706_576_676
PublicationCentury	2000
PublicationDate	2022 20221103
PublicationDateYYYYMMDD	2022-01-01 2022-11-03
PublicationDate_xml	– year: 2022 text: 2022
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XI
PublicationTitle	Computer Vision - ECCV 2022
PublicationYear	2022
Publisher	Springer Springer Nature Switzerland
Publisher_xml	– name: Springer – name: Springer Nature Switzerland
RelatedPersons	Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti
SSID	ssj0002731104 ssj0002792
Score	2.4869976
Snippet	Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes...
SourceID	springer proquest
SourceType	Publisher
StartPage	620
SubjectTerms	FPGA Hardware acceleration Mobile devices Model compression Vision transformer
Title	SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7130706&ppg=676 http://link.springer.com/10.1007/978-3-031-20083-0_37
Volume	13671
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Na9wwEIZFs72UHpp-0TRN0aE3o7KWZNnOLV0cQkg_IM6Sm5G1MoTCpmQ3Demv7yvZ2rXdXNKLEcI2Qo8YzYw0M4R8ggBsnFnAUp1OmTSNZLm2ktUNx4bVYAX5BKZfv6mTC3l6mVxu64T66JJ1_dn8eTCu5H-oog9cXZTsI8hufooOtMEXTxDGc6T8Dt2sbV6Brh5DNPfh4RGLitlsHvEp5_11cP5jflU6u79wUVLOMXCsV73PyqC5ukje31c6OtNOjb5nR3fuUtg5xHRUXv-0kJU3t8uw0XV-As5HfoLgJxzYj9i_Yu4P2wYCUai2LMo_4rV_owKfOhgZWlWbt2WYzVqloxTXftMsvsxgFkPOqAqWToWXdshOmiUT8vSoOD2bb_xjUKugmbiqXJtB5m3CpO2ge6GQD41pYDSMzrm9-lDukucupIS6WA-M8iV5YpevyIvOAKCdeF2hKzANfa_Jd8_vkAZ6tKVHW3q0T4-CHh3Qo44e9fRoR-8NuTguytkJ64pgsF9cijUTSW7NtMmltWoRa9Xo2IjE5qqOhcjrxMQZdqgsW5ipTRcpl0aoWimZ6FTXSZOIt2SyvF7ad4TKLE8WRmhMrpVQHOtcN1wbWKzaCpite4SF-ar8UX13P9i0s7OqRuT2SBQmtXKvr6qQAxs0KlGBhq9yihZovH_k3_fJs-0y_kAm65tbewAFcF1_7NbKX-e0UtE
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Vision+-+ECCV+2022&rft.atitle=SPViT%3A+Enabling+Faster+Vision+Transformers+via+Latency-Aware+Soft+Token+Pruning&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031200823&rft.volume=13671&rft_id=info:doi/10.1007%2F978-3-031-20083-0_37&rft.externalDBID=676&rft.externalDocID=EBC7130706_576_676
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7130706-l.jpg