HD-Former: A hierarchical dependency Transformer for medical image segmentation

Medical image segmentation is a compelling fundamental problem and an important auxiliary tool for clinical applications. Recently, the Transformer model has emerged as a valuable tool for addressing the limitations of convolutional neural networks by effectively capturing global relationships and n...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 178; p. 108671
Main Authors	Wu, Haifan, Min, Weidong, Gai, Di, Huang, Zheng, Geng, Yuhan, Wang, Qi, Chen, Ruibin
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.08.2024 Elsevier Limited
Subjects	Algorithms Artificial neural networks Compressed bottleneck Consistent feature space Dual cross attention Transformer Hierarchical dependencies Humans Image enhancement Image processing Image Processing, Computer-Assisted - methods Image segmentation Internal Medicine Medical image segmentation Medical imaging Modules Multilevel Neural networks Neural Networks, Computer Other Performance evaluation Semantics State-of-the-art reviews Transformers Dual cross attention Transformer Compressed bottleneck Consistent feature space Hierarchical dependencies Medical image segmentation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Medical image segmentation is a compelling fundamental problem and an important auxiliary tool for clinical applications. Recently, the Transformer model has emerged as a valuable tool for addressing the limitations of convolutional neural networks by effectively capturing global relationships and numerous hybrid architectures combining convolutional neural networks (CNNs) and Transformer have been devised to enhance segmentation performance. However, they suffer from multilevel semantic feature gaps and fail to account for multilevel dependencies between space and channel. In this paper, we propose a hierarchical dependency Transformer for medical image segmentation, named HD-Former. First, we utilize a Compressed Bottleneck (CB) module to enrich shallow features and localize the target region. We then introduce the Dual Cross Attention Transformer (DCAT) module to fuse multilevel features and bridge the feature gap. In addition, we design the broad exploration network (BEN) that cascades convolution and self-attention from different percepts to capture hierarchical dense contextual semantic features locally and globally. Finally, we exploit uncertain multitask edge loss to adaptively map predictions to a consistent feature space, which can optimize segmentation edges. The extensive experiments conducted on medical image segmentation from ISIC, LiTS, Kvasir-SEG, and CVC-ClinicDB datasets demonstrate that our HD-Former surpasses the state-of-the-art methods in terms of both subjective visual performance and objective evaluation. Code: https://github.com/barcelonacontrol/HD-Former. •Enhance the global information extraction ability of the model.•Design CB to capture shape perception features.•Utilize DCAT to fuse multi-level features.•Propose BEM to capture dense semantic contexts.•Introduced an uncertainty guided multitask loss function.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0010-4825 1879-0534 1879-0534
DOI:	10.1016/j.compbiomed.2024.108671