Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis
Accurate medical image segmentation can effectively assist disease diagnosis and treatment. While neural networks were often applied to solve the segmentation problem in recent computer-aided diagnosis, the metadata of patients was usually neglected. Motivated by this, we propose a medical image seg...
Saved in:
Published in | Multimedia systems Vol. 30; no. 2 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.04.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Accurate medical image segmentation can effectively assist disease diagnosis and treatment. While neural networks were often applied to solve the segmentation problem in recent computer-aided diagnosis, the metadata of patients was usually neglected. Motivated by this, we propose a medical image segmentation and diagnosis framework that takes advantage of both the image and the patient’s metadata, such as gender and age. We present MMY-NET: a new multi-modal network for simultaneous tumor segmentation and diagnosis exploiting patient metadata. Our architecture consists of three parts: a visual encoder, a text encoder, and a decoder with a self-attention block. Specifically, we design a text preprocessing block to embed metadata effectively, and the image features and text embedding features are then fused on several layers between the two encoders. Moreover, Interlaced Sparse Self-Attention is added to the decoder to further boost the performance. We apply our algorithm on 1 private dataset (ZJU2), and 1 private dataset (LISHUI) for zero-shot validation. Results show that our algorithm combined with metadata outperforms its counterpart without metadata by a large margin for basal cell carcinoma segmentation (14.3
%
improvement of IoU and 8.5
%
of Dice on the ZJU2 dataset, and 7.1
%
IoU on the LIZHUI validation dataset). Additionally, we applied MMY-Net to 1 public segmentation dataset to demonstrate its general segmentation capability. MMY-Net outperforms the state-of-the-art methods on the GlaS dataset. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-024-01260-9 |