SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge de...

Full description

Saved in:
Bibliographic Details
Published inComputer Vision - ECCV 2022 Vol. 13671; pp. 620 - 640
Main Authors Kong, Zhenglun, Dong, Peiyan, Ma, Xiaolong, Meng, Xin, Niu, Wei, Sun, Mengshu, Shen, Xuan, Yuan, Geng, Ren, Bin, Tang, Hao, Qin, Minghai, Wang, Yanzhi
Format Book Chapter
LanguageEnglish
Published Switzerland Springer 2022
Springer Nature Switzerland
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
AbstractList Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT.
Author Meng, Xin
Niu, Wei
Sun, Mengshu
Qin, Minghai
Dong, Peiyan
Kong, Zhenglun
Yuan, Geng
Ma, Xiaolong
Shen, Xuan
Ren, Bin
Tang, Hao
Wang, Yanzhi
Author_xml – sequence: 1
  givenname: Zhenglun
  orcidid: 0000-0002-8120-4456
  surname: Kong
  fullname: Kong, Zhenglun
– sequence: 2
  givenname: Peiyan
  orcidid: 0000-0001-5287-5149
  surname: Dong
  fullname: Dong, Peiyan
– sequence: 3
  givenname: Xiaolong
  orcidid: 0000-0003-1392-2787
  surname: Ma
  fullname: Ma, Xiaolong
– sequence: 4
  givenname: Xin
  orcidid: 0000-0003-2228-0587
  surname: Meng
  fullname: Meng, Xin
– sequence: 5
  givenname: Wei
  orcidid: 0000-0002-2697-7042
  surname: Niu
  fullname: Niu, Wei
– sequence: 6
  givenname: Mengshu
  orcidid: 0000-0003-3540-1464
  surname: Sun
  fullname: Sun, Mengshu
– sequence: 7
  givenname: Xuan
  orcidid: 0000-0003-4965-7321
  surname: Shen
  fullname: Shen, Xuan
– sequence: 8
  givenname: Geng
  orcidid: 0000-0001-9844-992X
  surname: Yuan
  fullname: Yuan, Geng
– sequence: 9
  givenname: Bin
  orcidid: 0000-0002-4116-5237
  surname: Ren
  fullname: Ren, Bin
– sequence: 10
  givenname: Hao
  orcidid: 0000-0002-2077-1246
  surname: Tang
  fullname: Tang, Hao
– sequence: 11
  givenname: Minghai
  orcidid: 0000-0001-5172-5309
  surname: Qin
  fullname: Qin, Minghai
– sequence: 12
  givenname: Yanzhi
  orcidid: 0000-0002-3024-7990
  surname: Wang
  fullname: Wang, Yanzhi
  email: yanz.wang@northeastern.edu
BookMark eNpFkNtOwzAMhsNRbMAbcJEXCDh1mjTcITQO0iSQNnYbpa0LhZGOpIB4e7oNiStbv_XZ8jdm-6ELxNiZhHMJYC6sKQQKQCkygGLoHJodNsYh2QRml42kllIgKrv3P8jsPhsBQiasUXjIxhJzMFZZLY_YaUqvAJAZlBLUiD3MHhft_JJPgi-XbXjmNz71FPmiTW0X-Dz6kJouvlNM_Kv1fOp7CtWPuPr2kfisa3o-794o8Mf4GQb-hB00fpno9K8es6ebyfz6Tkwfbu-vr6ZilSnsBeaWKmisItK19LrxssKcrC4loi3zShZ1TUVRV0CmNpmqUJdaq9wbX-ZNjscs2-5NqzicpejKrntLToJby3ODPIduMOI2rtxa3gCpLbSK3ccnpd7Rmqoo9NEvqxe_Gl5PzkgEA9rlRjttNP4CrFxvNA
ContentType Book Chapter
Copyright The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml – notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 006.37
DOI 10.1007/978-3-031-20083-0_37
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 3031200837
9783031200830
EISSN 1611-3349
Editor Farinella, Giovanni Maria
Avidan, Shai
Cissé, Moustapha
Brostow, Gabriel
Hassner, Tal
Editor_xml – sequence: 1
  fullname: Avidan, Shai
– sequence: 2
  fullname: Cissé, Moustapha
– sequence: 3
  fullname: Farinella, Giovanni Maria
– sequence: 4
  fullname: Brostow, Gabriel
– sequence: 5
  fullname: Hassner, Tal
EndPage 640
ExternalDocumentID EBC7130706_576_676
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
IEZ
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-~X
29L
2HA
2HV
ACGFS
ADCXD
EJD
F5P
LAS
LDH
P2P
RSU
~02
ID FETCH-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53
ISBN 3031200829
9783031200823
ISSN 0302-9743
IngestDate Tue Jul 29 20:13:56 EDT 2025
Thu May 29 16:41:50 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum TA1634
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53
Notes Z. Kong and P. Dong—Both authors contributed equally.
Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-20083-0_37.
OCLC 1350794961
ORCID 0000-0002-8120-4456
0000-0003-3540-1464
0000-0002-2077-1246
0000-0003-1392-2787
0000-0002-2697-7042
0000-0003-2228-0587
0000-0003-4965-7321
0000-0002-4116-5237
0000-0001-5172-5309
0000-0001-9844-992X
0000-0001-5287-5149
0000-0002-3024-7990
PQID EBC7130706_576_676
PageCount 21
ParticipantIDs springer_books_10_1007_978_3_031_20083_0_37
proquest_ebookcentralchapters_7130706_576_676
PublicationCentury 2000
PublicationDate 2022
20221103
PublicationDateYYYYMMDD 2022-01-01
2022-11-03
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XI
PublicationTitle Computer Vision - ECCV 2022
PublicationYear 2022
Publisher Springer
Springer Nature Switzerland
Publisher_xml – name: Springer
– name: Springer Nature Switzerland
RelatedPersons Hartmanis, Juris
Gao, Wen
Steffen, Bernhard
Bertino, Elisa
Goos, Gerhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002731104
ssj0002792
Score 2.4869976
Snippet Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes...
SourceID springer
proquest
SourceType Publisher
StartPage 620
SubjectTerms FPGA
Hardware acceleration
Mobile devices
Model compression
Vision transformer
Title SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7130706&ppg=676
http://link.springer.com/10.1007/978-3-031-20083-0_37
Volume 13671
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Na9wwEIZFs72UHpp-0TRN0aE3o7KWZNnOLV0cQkg_IM6Sm5G1MoTCpmQ3Demv7yvZ2rXdXNKLEcI2Qo8YzYw0M4R8ggBsnFnAUp1OmTSNZLm2ktUNx4bVYAX5BKZfv6mTC3l6mVxu64T66JJ1_dn8eTCu5H-oog9cXZTsI8hufooOtMEXTxDGc6T8Dt2sbV6Brh5DNPfh4RGLitlsHvEp5_11cP5jflU6u79wUVLOMXCsV73PyqC5ukje31c6OtNOjb5nR3fuUtg5xHRUXv-0kJU3t8uw0XV-As5HfoLgJxzYj9i_Yu4P2wYCUai2LMo_4rV_owKfOhgZWlWbt2WYzVqloxTXftMsvsxgFkPOqAqWToWXdshOmiUT8vSoOD2bb_xjUKugmbiqXJtB5m3CpO2ge6GQD41pYDSMzrm9-lDukucupIS6WA-M8iV5YpevyIvOAKCdeF2hKzANfa_Jd8_vkAZ6tKVHW3q0T4-CHh3Qo44e9fRoR-8NuTguytkJ64pgsF9cijUTSW7NtMmltWoRa9Xo2IjE5qqOhcjrxMQZdqgsW5ipTRcpl0aoWimZ6FTXSZOIt2SyvF7ad4TKLE8WRmhMrpVQHOtcN1wbWKzaCpite4SF-ar8UX13P9i0s7OqRuT2SBQmtXKvr6qQAxs0KlGBhq9yihZovH_k3_fJs-0y_kAm65tbewAFcF1_7NbKX-e0UtE
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Vision+-+ECCV+2022&rft.atitle=SPViT%3A+Enabling+Faster+Vision+Transformers+via+Latency-Aware+Soft+Token+Pruning&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031200823&rft.volume=13671&rft_id=info:doi/10.1007%2F978-3-031-20083-0_37&rft.externalDBID=676&rft.externalDocID=EBC7130706_576_676
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7130706-l.jpg