Lightweight transformer image feature extraction network

In recent years, the image feature extraction method based on Transformer has become a research hotspot. However, when using Transformer for image feature extraction, the model’s complexity increases quadratically with the number of tokens entered. The quadratic complexity prevents vision transforme...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 10; p. e1755
Main Authors	Zheng, Wenfeng, Lu, Siyu, Yang, Youshuai, Yin, Zhengtong, Yin, Lirong
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 31.01.2024 PeerJ Inc
Subjects	Analysis Artificial Intelligence Computational linguistics Computer Vision Efficient attention Electric transformers Image feature extraction Language processing Natural language interfaces Pruning Quadratic complexity Self-attention mechanism Telecommunication systems Transformer Pruning Efficient attention Self-attention mechanism Transformer Quadratic complexity Image feature extraction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the image feature extraction method based on Transformer has become a research hotspot. However, when using Transformer for image feature extraction, the model’s complexity increases quadratically with the number of tokens entered. The quadratic complexity prevents vision transformer-based backbone networks from modelling high-resolution images and is computationally expensive. To address this issue, this study proposes two approaches to speed up Transformer models. Firstly, the self-attention mechanism’s quadratic complexity is reduced to linear, enhancing the model’s internal processing speed. Next, a parameter-less lightweight pruning method is introduced, which adaptively samples input images to filter out unimportant tokens, effectively reducing irrelevant input. Finally, these two methods are combined to create an efficient attention mechanism. Experimental results demonstrate that the combined methods can reduce the computation of the original Transformer model by 30%–50%, while the efficient attention mechanism achieves an impressive 60%–70% reduction in computation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.1755