CascadeMedSeg: integrating pyramid vision transformer with multi-scale fusion for precise medical image segmentation
Medical image segmentation (MIS) is a key technique in computer-aided diagnosis. With the development of deep learning, especially convolutional neural networks, the performance of MIS has been significantly improved, however, some mainstream convolution-based methods still suffer from inaccurate ta...
Saved in:
Published in | Signal, image and video processing Vol. 18; no. 12; pp. 9067 - 9079 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
London
Springer London
01.12.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Medical image segmentation (MIS) is a key technique in computer-aided diagnosis. With the development of deep learning, especially convolutional neural networks, the performance of MIS has been significantly improved, however, some mainstream convolution-based methods still suffer from inaccurate target boundaries and imprecise segmentation results. At the same time, transformer-based methods have gradually achieved better segmentation results. To overcome the challenges of traditional methods, an accurate MIS model (CascadeMedSeg) is proposed in this paper, which combines a pyramid vision transformer (PVT) and multi-scale fusion. This network model follows a standard encoder-decoder segmentation architecture, where PVT is used as an encoder. PVT, designed as a pure Transformer backbone for pixel-level dense prediction tasks, can consistently generate a global receptive field and, as an encoder, flexibly learn multi-scale features of medical images. Two additional modules, namely Enhanced Attention Fusion (EAF) and Edge-Enhanced Segmentation (EES) are introduced. The EAF module fuses up-sampled and skip-connected features using an attention mechanism that enhances the perception of channel and positional information. The EES module enhances the boundary features of the network through the aggregation of multi-level features of the encoder and a dynamic boundary detection operator used to obtain a boundary mask and embed it into the decoder. Extensive experiments on five datasets show that CascadeMedSeg exhibits improved performance over several state-of-the-art methods. The MIoU values for the Kvasir-SEG, CVC-ClinicDB, ISIC 2018, and BUSI datasets are 88.16, 89.79, 86.32, and 66.69%, respectively. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-024-03530-5 |