A multi-modal approach for identifying schizophrenia using cross-modal attention
This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action u...
Saved in:
Published in | 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) Vol. 2024; pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
United States
IEEE
01.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs from the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN), with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score. |
---|---|
AbstractList | This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs from the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN), with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score. |
Author | Siriwarden, Yashish M. Premananth, Gowtham Espy-Wilson, Carol Resnik, Philip |
Author_xml | – sequence: 1 givenname: Gowtham surname: Premananth fullname: Premananth, Gowtham email: gowtham8@umd.edu organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA – sequence: 2 givenname: Yashish M. surname: Siriwarden fullname: Siriwarden, Yashish M. email: yashish@umd.edu organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA – sequence: 3 givenname: Philip surname: Resnik fullname: Resnik, Philip email: resnik@umd.edu organization: University of Maryland,Institute for Advanced Computer Studies,College Park,Maryland,USA – sequence: 4 givenname: Carol surname: Espy-Wilson fullname: Espy-Wilson, Carol email: espy@umd.edu organization: University of Maryland,Department of Electrical and Computer Engineering,College Park,Maryland,USA |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40039545$$D View this record in MEDLINE/PubMed |
BookMark | eNo9kM1OwzAQhA0C0VL6BgjyAim21669xxKVH6kIDnCuHNuhlpo4itNDeXoalXJa7cy3I81ek4smNp6Qe0ZnjFF8WL49FhIY1TNOuZgxqjTnqM7IFBVqkBQUE4jnZMznKHI6p2JEpimFkkqQQiKHKzISlAIetjH5WGT1btuHvI7ObDPTtl00dpNVscuC800fqn1ovrNkN-EntpvON8FkuzRotospnQ77foBjc0MuK7NNfvo3J-TraflZvOSr9-fXYrHKAxOqz5UFbVTpoPQVGPTOOictY1YiOo1UaCGd0SAqUMiF04oBln4QDQUOMCF3x9x2V9berdsu1Kbbr0_VDsDtEQje-3_79DD4BZ-EX-U |
ContentType | Conference Proceeding Journal Article |
DBID | 6IE 6IH CBEJK RIE RIO CGR CUY CVF ECM EIF NPM |
DOI | 10.1109/EMBC53108.2024.10782297 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350371499 |
EISSN | 2694-0604 |
EndPage | 5 |
ExternalDocumentID | 40039545 10782297 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: National Science Foundation funderid: 10.13039/100000001 |
GroupedDBID | 6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO CGR CUY CVF ECM EIF NPM |
ID | FETCH-LOGICAL-i147t-7c38a7bd3bef3a9edcdd5c11c599d8904845da834f37924d87139be45daa03233 |
IEDL.DBID | RIE |
IngestDate | Mon May 12 02:38:35 EDT 2025 Wed Aug 27 02:32:01 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i147t-7c38a7bd3bef3a9edcdd5c11c599d8904845da834f37924d87139be45daa03233 |
PMID | 40039545 |
PageCount | 5 |
ParticipantIDs | pubmed_primary_40039545 ieee_primary_10782297 |
PublicationCentury | 2000 |
PublicationDate | 2024-Jul |
PublicationDateYYYYMMDD | 2024-07-01 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-Jul |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) |
PublicationTitleAbbrev | EMBC |
PublicationTitleAlternate | Annu Int Conf IEEE Eng Med Biol Soc |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib053545923 ssib042469959 |
Score | 1.9000335 |
Snippet | This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who... |
SourceID | pubmed ieee |
SourceType | Index Database Publisher |
StartPage | 1 |
SubjectTerms | Adult Algorithms Attention Audio-visual systems Biological system modeling Computational modeling Engineering in medicine and biology Facial Action units Feature extraction Female Humans Male Multi-modal model Schizophrenia Schizophrenia - diagnosis Schizophrenia - physiopathology Text Embeddings Vocal tract variables |
Title | A multi-modal approach for identifying schizophrenia using cross-modal attention |
URI | https://ieeexplore.ieee.org/document/10782297 https://www.ncbi.nlm.nih.gov/pubmed/40039545 |
Volume | 2024 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH64nTypOHX-GDl4bV2bxDZHHRtD2NjBwW6jyUtlqJtoe_Gv9yVt5xgI3krLK-Hlpe97zfe9ANy6A5BMHqeBUVEeCCmzQCnJvQoEk9xobZwaeTK9H8_F00IuarG618JYaz35zIbu0u_l48aU7lcZrXCXz1TSghZVbpVYqwkeEVOht9MoRXLCBoReak5X1Fd3w8njgEKu7xhdsQibt9XnquzhSp9fRkcwbUZW0Upew7LQofnea9r476EfQ-dXysdm2yR1Agd2fQqzB-aJhMH7BrM31vQVZwRg2cord736iX3tMvKYo8i_MJ9WG8OiqNiSHZiPhs-DcVAfrRCsIpEUQWJ4miUaubY5z5RFgyhNFBmpFKaKlrWQmKVc5DyhCg2prOJKW3cz6_OY8zNorzdrewEslxY1oSwy1iLNpEIjUOX03UAkOBl1oePcsfyoumcsG0904bzy9_aJcEJhmrXLPyyu4NBNXEWWvYZ28VnaG4IEhe5Bazqb9HxA_AApFrWa |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4UD3pSIyr-7MHrJltbtx6VQFCBcICEG2n7OkNUMDou_vW-dhsSEhNvy7a3NK-ve1-373uPkBvXAMlkcRoYGWUBF0IFUgrmVSCQZEZr49TI_cFdd8yfJmJSitW9FsZa68lnNnSH_l8-LMzSfSrDFe7ymUy2yQ4mfhEXcq0qfHiMW721UimC4U2IX0pWV9SUt-3-QwuDruk4XTEPq-eVnVU2kKXPMJ19MqjGVhBLXsNlrkPzvVG28d-DPyD1XzEfHa7S1CHZsvMjMrynnkoYvC9AvdGqsjhFCEtnXrvr9U_0a52TRx1J_oX6xFoZ5nnBl6yTcac9anWDsrlCMIt4kgeJYalKNDBtM6akBQMgTBQZISWkEhc2F6BSxjOW4B4NcGPFpLbupGqymLFjUpsv5vaU0ExY0Iiz0FjzVAkJhoPM8M0BgIAyapC6c8f0o6ifMa080SAnhb9XV7iTCuOsnf1hcU12u6N-b9p7HDyfkz03iQV19oLU8s-lvUSAkOsrHxY_h9O32A |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+46th+Annual+International+Conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society+%28EMBC%29&rft.atitle=A+multi-modal+approach+for+identifying+schizophrenia+using+cross-modal+attention&rft.au=Premananth%2C+Gowtham&rft.au=Siriwarden%2C+Yashish+M.&rft.au=Resnik%2C+Philip&rft.au=Espy-Wilson%2C+Carol&rft.date=2024-07-01&rft.pub=IEEE&rft.eissn=2694-0604&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FEMBC53108.2024.10782297&rft.externalDocID=10782297 |