A multi-modal approach for identifying schizophrenia using cross-modal attention

This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action u...

Full description

Saved in:
Bibliographic Details
Published in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) Vol. 2024; pp. 1 - 5
Main Authors Premananth, Gowtham, Siriwarden, Yashish M., Resnik, Philip, Espy-Wilson, Carol
Format Conference Proceeding Journal Article
LanguageEnglish
Published United States IEEE 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs from the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN), with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
AbstractList This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs from the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN), with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
Author Siriwarden, Yashish M.
Premananth, Gowtham
Espy-Wilson, Carol
Resnik, Philip
Author_xml – sequence: 1
  givenname: Gowtham
  surname: Premananth
  fullname: Premananth, Gowtham
  email: gowtham8@umd.edu
  organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA
– sequence: 2
  givenname: Yashish M.
  surname: Siriwarden
  fullname: Siriwarden, Yashish M.
  email: yashish@umd.edu
  organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA
– sequence: 3
  givenname: Philip
  surname: Resnik
  fullname: Resnik, Philip
  email: resnik@umd.edu
  organization: University of Maryland,Institute for Advanced Computer Studies,College Park,Maryland,USA
– sequence: 4
  givenname: Carol
  surname: Espy-Wilson
  fullname: Espy-Wilson, Carol
  email: espy@umd.edu
  organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40039545$$D View this record in MEDLINE/PubMed
BookMark eNo9kM1OwzAQhA0C0VL6BgjyAim21669xxKVH6kIDnCuHNuhlpo4itNDeXoalXJa7cy3I81ek4smNp6Qe0ZnjFF8WL49FhIY1TNOuZgxqjTnqM7IFBVqkBQUE4jnZMznKHI6p2JEpimFkkqQQiKHKzISlAIetjH5WGT1btuHvI7ObDPTtl00dpNVscuC800fqn1ovrNkN-EntpvON8FkuzRotospnQ77foBjc0MuK7NNfvo3J-TraflZvOSr9-fXYrHKAxOqz5UFbVTpoPQVGPTOOictY1YiOo1UaCGd0SAqUMiF04oBln4QDQUOMCF3x9x2V9berdsu1Kbbr0_VDsDtEQje-3_79DD4BZ-EX-U
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1109/EMBC53108.2024.10782297
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350371499
EISSN 2694-0604
EndPage 5
ExternalDocumentID 40039545
10782297
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IH
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
CGR
CUY
CVF
ECM
EIF
NPM
ID FETCH-LOGICAL-i147t-7c38a7bd3bef3a9edcdd5c11c599d8904845da834f37924d87139be45daa03233
IEDL.DBID RIE
IngestDate Mon May 12 02:38:35 EDT 2025
Wed Aug 27 02:32:01 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i147t-7c38a7bd3bef3a9edcdd5c11c599d8904845da834f37924d87139be45daa03233
PMID 40039545
PageCount 5
ParticipantIDs pubmed_primary_40039545
ieee_primary_10782297
PublicationCentury 2000
PublicationDate 2024-Jul
PublicationDateYYYYMMDD 2024-07-01
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-Jul
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
PublicationTitleAbbrev EMBC
PublicationTitleAlternate Annu Int Conf IEEE Eng Med Biol Soc
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib053545923
ssib042469959
Score 1.9000335
Snippet This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who...
SourceID pubmed
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Adult
Algorithms
Attention
Audio-visual systems
Biological system modeling
Computational modeling
Engineering in medicine and biology
Facial Action units
Feature extraction
Female
Humans
Male
Multi-modal model
Schizophrenia
Schizophrenia - diagnosis
Schizophrenia - physiopathology
Text Embeddings
Vocal tract variables
Title A multi-modal approach for identifying schizophrenia using cross-modal attention
URI https://ieeexplore.ieee.org/document/10782297
https://www.ncbi.nlm.nih.gov/pubmed/40039545
Volume 2024
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH64nTypOHX-GDl4bV2bxDZHHRtD2NjBwW6jyUtlqJtoe_Gv9yVt5xgI3krLK-Hlpe97zfe9ANy6A5BMHqeBUVEeCCmzQCnJvQoEk9xobZwaeTK9H8_F00IuarG618JYaz35zIbu0u_l48aU7lcZrXCXz1TSghZVbpVYqwkeEVOht9MoRXLCBoReak5X1Fd3w8njgEKu7xhdsQibt9XnquzhSp9fRkcwbUZW0Upew7LQofnea9r476EfQ-dXysdm2yR1Agd2fQqzB-aJhMH7BrM31vQVZwRg2cord736iX3tMvKYo8i_MJ9WG8OiqNiSHZiPhs-DcVAfrRCsIpEUQWJ4miUaubY5z5RFgyhNFBmpFKaKlrWQmKVc5DyhCg2prOJKW3cz6_OY8zNorzdrewEslxY1oSwy1iLNpEIjUOX03UAkOBl1oePcsfyoumcsG0904bzy9_aJcEJhmrXLPyyu4NBNXEWWvYZ28VnaG4IEhe5Bazqb9HxA_AApFrWa
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4UD3pSIyr-7MHrJltbtx6VQFCBcICEG2n7OkNUMDou_vW-dhsSEhNvy7a3NK-ve1-373uPkBvXAMlkcRoYGWUBF0IFUgrmVSCQZEZr49TI_cFdd8yfJmJSitW9FsZa68lnNnSH_l8-LMzSfSrDFe7ymUy2yQ4mfhEXcq0qfHiMW721UimC4U2IX0pWV9SUt-3-QwuDruk4XTEPq-eVnVU2kKXPMJ19MqjGVhBLXsNlrkPzvVG28d-DPyD1XzEfHa7S1CHZsvMjMrynnkoYvC9AvdGqsjhFCEtnXrvr9U_0a52TRx1J_oX6xFoZ5nnBl6yTcac9anWDsrlCMIt4kgeJYalKNDBtM6akBQMgTBQZISWkEhc2F6BSxjOW4B4NcGPFpLbupGqymLFjUpsv5vaU0ExY0Iiz0FjzVAkJhoPM8M0BgIAyapC6c8f0o6ifMa080SAnhb9XV7iTCuOsnf1hcU12u6N-b9p7HDyfkz03iQV19oLU8s-lvUSAkOsrHxY_h9O32A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+46th+Annual+International+Conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society+%28EMBC%29&rft.atitle=A+multi-modal+approach+for+identifying+schizophrenia+using+cross-modal+attention&rft.au=Premananth%2C+Gowtham&rft.au=Siriwarden%2C+Yashish+M.&rft.au=Resnik%2C+Philip&rft.au=Espy-Wilson%2C+Carol&rft.date=2024-07-01&rft.pub=IEEE&rft.eissn=2694-0604&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FEMBC53108.2024.10782297&rft.externalDocID=10782297