A multi-modal approach for identifying schizophrenia using cross-modal attention

This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action u...

Full description

Saved in:

Bibliographic Details
Published in	2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) Vol. 2024; pp. 1 - 5
Main Authors	Premananth, Gowtham, Siriwarden, Yashish M., Resnik, Philip, Espy-Wilson, Carol
Format	Conference Proceeding Journal Article
Language	English
Published	United States IEEE 01.07.2024
Subjects	Adult Algorithms Attention Audio-visual systems Biological system modeling Computational modeling Engineering in medicine and biology Facial Action units Feature extraction Female Humans Male Multi-modal model Schizophrenia Schizophrenia - diagnosis Schizophrenia - physiopathology Text Embeddings Vocal tract variables
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs from the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN), with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
ISSN:	2694-0604
DOI:	10.1109/EMBC53108.2024.10782297