A multi-scale two-branch fusion network for simultaneous segmentation in electronic laryngoscope images
Three issues reduced the performance of networks for handling the organs and lesions' simultaneous segmentation in electronic laryngoscopy images. Firstly, the moving endoscope will cause noticeable variations of the shape and angle in lesions and organs. Secondly, the lesions, mainly the polyp...
Saved in:
Published in | Digital signal processing Vol. 140; p. 104132 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Three issues reduced the performance of networks for handling the organs and lesions' simultaneous segmentation in electronic laryngoscopy images. Firstly, the moving endoscope will cause noticeable variations of the shape and angle in lesions and organs. Secondly, the lesions, mainly the polyps, and the major organs differ considerably in size. Moreover, the boundaries between the lesions or organs and their backgrounds are usually indistinguishable since their color and texture are very close to the mucosal tissues. To improve the simultaneous segmentation accuracy, we propose a multi-scale two-branch fusion network (MsFusionNet), which adopted an asymmetric two-branch structure to fuse the fine-grained feature maps extracted by the convolution neural network with the global context feature maps extracted by the Vision Transformer at different scales. In addition, a Multi-scale Dark Part Feature Enhancement module (MsDFE) was designed to enhance the non-salient details of organs before the feature fusion in the two-branch network. To evaluate the universality and effectiveness of the proposed method, we used a mixed dataset collected from three institutions, including 2425 electronic laryngoscope images with major organs in the pharynx and larynx. The results show the proposed method performs better than nine existing segmentation networks in dealing with the experiment dataset, which has good potential for clinical practice.
•We design an asymmetric two-branch architecture to fuse the CNN and the Transformer.•We propose the dark part features enhancement module (MsDFEi) to fully utilize potential information in images.•Qualitative and quantitative evaluations demonstrate the superiority of our method. |
---|---|
ISSN: | 1051-2004 1095-4333 |
DOI: | 10.1016/j.dsp.2023.104132 |