P2T: Pyramid Pooling Transformer for Scene Understanding

Recently, the vision transformer has achieved great success by pushing the state-of-the-art of various vision tasks. One of the most challenging problems in the vision transformer is that the large sequence length of image tokens leads to high computational cost (quadratic complexity). A popular sol...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 11; pp. 12760 - 12771
Main Authors	Wu, Yu-Huan, Liu, Yun, Zhan, Xin, Cheng, Ming-Ming
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	backbone network Computational modeling Computer networks Convolution efficient self-attention Feature extraction Image classification Image segmentation Network design Object recognition pyramid pooling Scene analysis scene understanding Semantic segmentation Semantics Task analysis Transformer Transformers
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!