AST-Net: Lightweight Hybrid Transformer for Multimodal Brain Tumor Segmentation

Encoder-Decoder networks based on local convolutions have shown state-of-the-art results on various medical image segmentation tasks. However, they have limited ability to capture long-range spatial contexts, which has intrigued the development of Transformers with attention mechanisms. Despite thei...

Full description

Saved in:
Bibliographic Details
Published in2022 26th International Conference on Pattern Recognition (ICPR) pp. 4623 - 4629
Main Authors Wang, Peixu, Liu, Shikun, Peng, Jialin
Format Conference Proceeding
LanguageEnglish
Published IEEE 21.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Encoder-Decoder networks based on local convolutions have shown state-of-the-art results on various medical image segmentation tasks. However, they have limited ability to capture long-range spatial contexts, which has intrigued the development of Transformers with attention mechanisms. Despite their success, Transformers usually have limitations in processing huge medical image volume data due to their high computational complexity and relying on large-scale pre-training. Hence, we introduce a hybrid encoder-decoder, which utilizes both lightweight convolution modules and an axial-spatial transformer (AST) module in the encoder. To capture better multi-view and multi-scale features, we intergrade axial and spatial attention in the AST module to learn long-range dependencies. Meanwhile, convolution operations extract local dependencies and rich local features. Compared to pure vision transformers, the hybrid model has much fewer learnable parameters, which is desirable for clinical usage. The experimental results on three challenging benchmarks demonstrate the competitive performance of the proposed model over the state of the arts.
ISSN:2831-7475
DOI:10.1109/ICPR56361.2022.9956705