A multimodal transformer to fuse images and metadata for skin disease classification A multimodal transformer to fuse images and metadata for skin disease classification

Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases...

Full description

Saved in:

Bibliographic Details
Published in	The Visual computer Vol. 39; no. 7; pp. 2781 - 2793
Main Authors	Cai, Gan, Zhu, Yu, Wu, Yue, Jiang, Xiaoben, Ye, Jiongyao, Yang, Dawei
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2023 Springer Nature B.V
Subjects	Artificial Intelligence Classification Coders Computer Graphics Computer Science Datasets Deep learning Design Diagnosis Experiments Image Processing and Computer Vision Labels Machine learning Medical diagnosis Medical imaging Medical research Metadata Neural networks Original Original Article Skin diseases Deep learning Skin disease Transformer Multimodal fusion Attention
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model’s effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-022-02492-4