AST-Net: Lightweight Hybrid Transformer for Multimodal Brain Tumor Segmentation
Encoder-Decoder networks based on local convolutions have shown state-of-the-art results on various medical image segmentation tasks. However, they have limited ability to capture long-range spatial contexts, which has intrigued the development of Transformers with attention mechanisms. Despite thei...
Saved in:
Published in | 2022 26th International Conference on Pattern Recognition (ICPR) pp. 4623 - 4629 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
21.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Encoder-Decoder networks based on local convolutions have shown state-of-the-art results on various medical image segmentation tasks. However, they have limited ability to capture long-range spatial contexts, which has intrigued the development of Transformers with attention mechanisms. Despite their success, Transformers usually have limitations in processing huge medical image volume data due to their high computational complexity and relying on large-scale pre-training. Hence, we introduce a hybrid encoder-decoder, which utilizes both lightweight convolution modules and an axial-spatial transformer (AST) module in the encoder. To capture better multi-view and multi-scale features, we intergrade axial and spatial attention in the AST module to learn long-range dependencies. Meanwhile, convolution operations extract local dependencies and rich local features. Compared to pure vision transformers, the hybrid model has much fewer learnable parameters, which is desirable for clinical usage. The experimental results on three challenging benchmarks demonstrate the competitive performance of the proposed model over the state of the arts. |
---|---|
ISSN: | 2831-7475 |
DOI: | 10.1109/ICPR56361.2022.9956705 |