SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge de...
Saved in:
Published in | Computer Vision - ECCV 2022 Vol. 13671; pp. 620 - 640 |
---|---|
Main Authors | , , , , , , , , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer
2022
Springer Nature Switzerland |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT. |
---|---|
AbstractList | Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens chosen by the selector module into a package token rather than discarding them completely. SPViT is bound to the trade-off between accuracy and latency requirements of specific edge devices through our proposed latency-aware training strategy. Experiment results show that SPViT significantly reduces the computation cost of ViTs with comparable performance on image classification. Moreover, SPViT can guarantee the identified model meets the latency specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile devices. For example, SPViT reduces the latency of DeiT-T to 26 ms (26%−41% superior to existing works) on the mobile device with 0.25%−4% higher top-1 accuracy on ImageNet. Our code is released at https://github.com/PeiyanFlying/SPViT. |
Author | Meng, Xin Niu, Wei Sun, Mengshu Qin, Minghai Dong, Peiyan Kong, Zhenglun Yuan, Geng Ma, Xiaolong Shen, Xuan Ren, Bin Tang, Hao Wang, Yanzhi |
Author_xml | – sequence: 1 givenname: Zhenglun orcidid: 0000-0002-8120-4456 surname: Kong fullname: Kong, Zhenglun – sequence: 2 givenname: Peiyan orcidid: 0000-0001-5287-5149 surname: Dong fullname: Dong, Peiyan – sequence: 3 givenname: Xiaolong orcidid: 0000-0003-1392-2787 surname: Ma fullname: Ma, Xiaolong – sequence: 4 givenname: Xin orcidid: 0000-0003-2228-0587 surname: Meng fullname: Meng, Xin – sequence: 5 givenname: Wei orcidid: 0000-0002-2697-7042 surname: Niu fullname: Niu, Wei – sequence: 6 givenname: Mengshu orcidid: 0000-0003-3540-1464 surname: Sun fullname: Sun, Mengshu – sequence: 7 givenname: Xuan orcidid: 0000-0003-4965-7321 surname: Shen fullname: Shen, Xuan – sequence: 8 givenname: Geng orcidid: 0000-0001-9844-992X surname: Yuan fullname: Yuan, Geng – sequence: 9 givenname: Bin orcidid: 0000-0002-4116-5237 surname: Ren fullname: Ren, Bin – sequence: 10 givenname: Hao orcidid: 0000-0002-2077-1246 surname: Tang fullname: Tang, Hao – sequence: 11 givenname: Minghai orcidid: 0000-0001-5172-5309 surname: Qin fullname: Qin, Minghai – sequence: 12 givenname: Yanzhi orcidid: 0000-0002-3024-7990 surname: Wang fullname: Wang, Yanzhi email: yanz.wang@northeastern.edu |
BookMark | eNpFkNtOwzAMhsNRbMAbcJEXCDh1mjTcITQO0iSQNnYbpa0LhZGOpIB4e7oNiStbv_XZ8jdm-6ELxNiZhHMJYC6sKQQKQCkygGLoHJodNsYh2QRml42kllIgKrv3P8jsPhsBQiasUXjIxhJzMFZZLY_YaUqvAJAZlBLUiD3MHhft_JJPgi-XbXjmNz71FPmiTW0X-Dz6kJouvlNM_Kv1fOp7CtWPuPr2kfisa3o-794o8Mf4GQb-hB00fpno9K8es6ebyfz6Tkwfbu-vr6ZilSnsBeaWKmisItK19LrxssKcrC4loi3zShZ1TUVRV0CmNpmqUJdaq9wbX-ZNjscs2-5NqzicpejKrntLToJby3ODPIduMOI2rtxa3gCpLbSK3ccnpd7Rmqoo9NEvqxe_Gl5PzkgEA9rlRjttNP4CrFxvNA |
ContentType | Book Chapter |
Copyright | The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 |
Copyright_xml | – notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 |
DBID | FFUUA |
DEWEY | 006.37 |
DOI | 10.1007/978-3-031-20083-0_37 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Computer Science |
EISBN | 3031200837 9783031200830 |
EISSN | 1611-3349 |
Editor | Farinella, Giovanni Maria Avidan, Shai Cissé, Moustapha Brostow, Gabriel Hassner, Tal |
Editor_xml | – sequence: 1 fullname: Avidan, Shai – sequence: 2 fullname: Cissé, Moustapha – sequence: 3 fullname: Farinella, Giovanni Maria – sequence: 4 fullname: Brostow, Gabriel – sequence: 5 fullname: Hassner, Tal |
EndPage | 640 |
ExternalDocumentID | EBC7130706_576_676 |
GroupedDBID | 38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02 |
ID | FETCH-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53 |
ISBN | 3031200829 9783031200823 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:13:56 EDT 2025 Thu May 29 16:41:50 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | TA1634 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p243t-359ec0f94ee6d1a6fa1c35e96b1339b5c18dde88dc0e7d724c36b6645a7ab5f53 |
Notes | Z. Kong and P. Dong—Both authors contributed equally. Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-20083-0_37. |
OCLC | 1350794961 |
ORCID | 0000-0002-8120-4456 0000-0003-3540-1464 0000-0002-2077-1246 0000-0003-1392-2787 0000-0002-2697-7042 0000-0003-2228-0587 0000-0003-4965-7321 0000-0002-4116-5237 0000-0001-5172-5309 0000-0001-9844-992X 0000-0001-5287-5149 0000-0002-3024-7990 |
PQID | EBC7130706_576_676 |
PageCount | 21 |
ParticipantIDs | springer_books_10_1007_978_3_031_20083_0_37 proquest_ebookcentralchapters_7130706_576_676 |
PublicationCentury | 2000 |
PublicationDate | 2022 20221103 |
PublicationDateYYYYMMDD | 2022-01-01 2022-11-03 |
PublicationDate_xml | – year: 2022 text: 2022 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XI |
PublicationTitle | Computer Vision - ECCV 2022 |
PublicationYear | 2022 |
Publisher | Springer Springer Nature Switzerland |
Publisher_xml | – name: Springer – name: Springer Nature Switzerland |
RelatedPersons | Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti |
SSID | ssj0002731104 ssj0002792 |
Score | 2.4869976 |
Snippet | Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 620 |
SubjectTerms | FPGA Hardware acceleration Mobile devices Model compression Vision transformer |
Title | SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7130706&ppg=676 http://link.springer.com/10.1007/978-3-031-20083-0_37 |
Volume | 13671 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Na9wwEIZFs72UHpp-0TRN0aE3o7KWZNnOLV0cQkg_IM6Sm5G1MoTCpmQ3Demv7yvZ2rXdXNKLEcI2Qo8YzYw0M4R8ggBsnFnAUp1OmTSNZLm2ktUNx4bVYAX5BKZfv6mTC3l6mVxu64T66JJ1_dn8eTCu5H-oog9cXZTsI8hufooOtMEXTxDGc6T8Dt2sbV6Brh5DNPfh4RGLitlsHvEp5_11cP5jflU6u79wUVLOMXCsV73PyqC5ukje31c6OtNOjb5nR3fuUtg5xHRUXv-0kJU3t8uw0XV-As5HfoLgJxzYj9i_Yu4P2wYCUai2LMo_4rV_owKfOhgZWlWbt2WYzVqloxTXftMsvsxgFkPOqAqWToWXdshOmiUT8vSoOD2bb_xjUKugmbiqXJtB5m3CpO2ge6GQD41pYDSMzrm9-lDukucupIS6WA-M8iV5YpevyIvOAKCdeF2hKzANfa_Jd8_vkAZ6tKVHW3q0T4-CHh3Qo44e9fRoR-8NuTguytkJ64pgsF9cijUTSW7NtMmltWoRa9Xo2IjE5qqOhcjrxMQZdqgsW5ipTRcpl0aoWimZ6FTXSZOIt2SyvF7ad4TKLE8WRmhMrpVQHOtcN1wbWKzaCpite4SF-ar8UX13P9i0s7OqRuT2SBQmtXKvr6qQAxs0KlGBhq9yihZovH_k3_fJs-0y_kAm65tbewAFcF1_7NbKX-e0UtE |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Vision+-+ECCV+2022&rft.atitle=SPViT%3A+Enabling+Faster+Vision+Transformers+via+Latency-Aware+Soft+Token+Pruning&rft.date=2022-01-01&rft.pub=Springer&rft.isbn=9783031200823&rft.volume=13671&rft_id=info:doi/10.1007%2F978-3-031-20083-0_37&rft.externalDBID=676&rft.externalDocID=EBC7130706_576_676 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7130706-l.jpg |