A multi-scale two-branch fusion network for simultaneous segmentation in electronic laryngoscope images

Three issues reduced the performance of networks for handling the organs and lesions' simultaneous segmentation in electronic laryngoscopy images. Firstly, the moving endoscope will cause noticeable variations of the shape and angle in lesions and organs. Secondly, the lesions, mainly the polyp...

Full description

Saved in:

Bibliographic Details
Published in	Digital signal processing Vol. 140; p. 104132
Main Authors	Chen, Hao, Hao, Mayang, Wang, Chenwu, Wang, Pei
Format	Journal Article
Language	English
Published	Elsevier Inc 01.08.2023
Subjects	Dark part feature enhancement Electronic laryngoscope Multi-scale features fusion Simultaneous segmentation Two-branch network Dark part feature enhancement Simultaneous segmentation Multi-scale features fusion Electronic laryngoscope Two-branch network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Three issues reduced the performance of networks for handling the organs and lesions' simultaneous segmentation in electronic laryngoscopy images. Firstly, the moving endoscope will cause noticeable variations of the shape and angle in lesions and organs. Secondly, the lesions, mainly the polyps, and the major organs differ considerably in size. Moreover, the boundaries between the lesions or organs and their backgrounds are usually indistinguishable since their color and texture are very close to the mucosal tissues. To improve the simultaneous segmentation accuracy, we propose a multi-scale two-branch fusion network (MsFusionNet), which adopted an asymmetric two-branch structure to fuse the fine-grained feature maps extracted by the convolution neural network with the global context feature maps extracted by the Vision Transformer at different scales. In addition, a Multi-scale Dark Part Feature Enhancement module (MsDFE) was designed to enhance the non-salient details of organs before the feature fusion in the two-branch network. To evaluate the universality and effectiveness of the proposed method, we used a mixed dataset collected from three institutions, including 2425 electronic laryngoscope images with major organs in the pharynx and larynx. The results show the proposed method performs better than nine existing segmentation networks in dealing with the experiment dataset, which has good potential for clinical practice. •We design an asymmetric two-branch architecture to fuse the CNN and the Transformer.•We propose the dark part features enhancement module (MsDFEi) to fully utilize potential information in images.•Qualitative and quantitative evaluations demonstrate the superiority of our method.
ISSN:	1051-2004 1095-4333
DOI:	10.1016/j.dsp.2023.104132