Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis

Accurate medical image segmentation can effectively assist disease diagnosis and treatment. While neural networks were often applied to solve the segmentation problem in recent computer-aided diagnosis, the metadata of patients was usually neglected. Motivated by this, we propose a medical image seg...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia systems Vol. 30; no. 2
Main Authors	Gu, Renshu, Zhang, Yueyu, Wang, Lisha, Chen, Dechao, Wang, Yaqi, Ge, Ruiquan, Jiao, Zicheng, Ye, Juan, Jia, Gangyong, Wang, Linyan
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2024 Springer Nature B.V
Subjects	Algorithms Coders Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Datasets Diagnosis Image segmentation Medical imaging Metadata Multimedia Information Systems Neural networks Operating Systems Regular Paper Deep learning Metadata Tumor Segmentation Multi-modal
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Accurate medical image segmentation can effectively assist disease diagnosis and treatment. While neural networks were often applied to solve the segmentation problem in recent computer-aided diagnosis, the metadata of patients was usually neglected. Motivated by this, we propose a medical image segmentation and diagnosis framework that takes advantage of both the image and the patient’s metadata, such as gender and age. We present MMY-NET: a new multi-modal network for simultaneous tumor segmentation and diagnosis exploiting patient metadata. Our architecture consists of three parts: a visual encoder, a text encoder, and a decoder with a self-attention block. Specifically, we design a text preprocessing block to embed metadata effectively, and the image features and text embedding features are then fused on several layers between the two encoders. Moreover, Interlaced Sparse Self-Attention is added to the decoder to further boost the performance. We apply our algorithm on 1 private dataset (ZJU2), and 1 private dataset (LISHUI) for zero-shot validation. Results show that our algorithm combined with metadata outperforms its counterpart without metadata by a large margin for basal cell carcinoma segmentation (14.3 % improvement of IoU and 8.5 % of Dice on the ZJU2 dataset, and 7.1 % IoU on the LIZHUI validation dataset). Additionally, we applied MMY-Net to 1 public segmentation dataset to demonstrate its general segmentation capability. MMY-Net outperforms the state-of-the-art methods on the GlaS dataset.
ISSN:	0942-4962 1432-1882
DOI:	10.1007/s00530-024-01260-9