GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition
Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fie...
Saved in:
Published in | IEEE transactions on multimedia Vol. 26; pp. 77 - 89 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 1520-9210 1941-0077 |
DOI | 10.1109/TMM.2023.3260635 |
Cover
Loading…
Abstract | Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches. |
---|---|
AbstractList | Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches. |
Author | Lv, Guoqing Li, Jiang Zeng, Zhigang Wang, Xiaoping |
Author_xml | – sequence: 1 givenname: Jiang orcidid: 0000-0002-0116-5662 surname: Li fullname: Li, Jiang email: lijfrank@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 2 givenname: Xiaoping orcidid: 0000-0002-4909-8286 surname: Wang fullname: Wang, Xiaoping email: wangxiaoping@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 3 givenname: Guoqing orcidid: 0009-0004-2966-5724 surname: Lv fullname: Lv, Guoqing email: guoqinglv@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 4 givenname: Zhigang orcidid: 0000-0003-4587-3588 surname: Zeng fullname: Zeng, Zhigang email: zgzeng@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China |
BookMark | eNp9kDFPwzAQhS1UJNrCzsBgiTnl7CROzFZCW5BaIaEyR45zpamSONgpEhN_naTtgBiY7t3pe6e7NyKD2tRIyDWDCWMg79ar1YQD9yc-FyD88IwMmQyYBxBFg06HHDzJGVyQkXM7ABaEEA3J98KqZpvMk3s6pY-FRd1iTg9D-qBcpxNrnPNWJlclnaNq9xZpYqqmxArrVrWFqem0aaxReks3xtLVvmyL6sAnpv5E6w5Q184qc8BfUZv3uuj1JTnfqNLh1amOydt8tk6evOXL4jmZLj3NJW-9SIV-LHQGKuJ5oAFDKcIMMmQsyxlX0s8lz-PYz7jEPIwFzwRCFECAeiOZ9Mfk9ri3O_Rjj65Nd2Zvu6NcyiXjMuAiCDpKHCndP21xk-ri-GJrVVGmDNI-7LQLO-3DTk9hd0b4Y2xsUSn79Z_l5mgpEPEXDlHMBPN_ANytjN0 |
CODEN | ITMUF8 |
CitedBy_id | crossref_primary_10_1007_s11432_023_3908_6 crossref_primary_10_4018_IJDSST_352398 crossref_primary_10_1007_s00530_024_01618_z crossref_primary_10_1007_s12559_024_10287_z crossref_primary_10_1109_TMM_2024_3521669 crossref_primary_10_3389_fcomp_2024_1304687 crossref_primary_10_1109_TCSS_2024_3420445 crossref_primary_10_1016_j_neucom_2024_128937 crossref_primary_10_1109_TCSS_2024_3409715 crossref_primary_10_1016_j_ins_2024_121393 crossref_primary_10_1109_ACCESS_2023_3348518 crossref_primary_10_1109_TIP_2024_3504298 crossref_primary_10_1016_j_knosys_2025_113029 crossref_primary_10_3390_electronics13132645 crossref_primary_10_1016_j_jksuci_2023_101791 crossref_primary_10_1109_TCSVT_2024_3405406 crossref_primary_10_1109_TASLP_2024_3434495 crossref_primary_10_3390_electronics13214208 |
Cites_doi | 10.18653/v1/N18-1193 10.1109/CVPR.2016.90 10.1109/ICASSP40776.2020.9053896 10.1145/3394171.3413678 10.18653/v1/P19-1050 10.18653/v1/2021.acl-long.123 10.18653/v1/2021.acl-long.547 10.18653/v1/2020.emnlp-main.721 10.48550/ARXIV.1706.03762 10.18653/v1/P19-1656 10.1007/s10579-008-9076-6 10.1109/ACCESS.2019.2916887 10.1609/aaai.v32i1.11604 10.1109/ICCV48922.2021.00986 10.1146/annurev-psych-020821-010855 10.1609/aaai.v34i05.6431 10.18653/v1/D19-1015 10.1109/ICASSP.2019.8682916 10.18653/v1/2020.findings-emnlp.224 10.1609/aaai.v33i01.33016818 10.1109/TPAMI.2021.3074057 10.1145/3442381.3450004 10.18653/v1/d18-1280 10.18653/v1/S19-2005 10.24963/ijcai.2019/752 10.18653/v1/2020.emnlp-main.597 10.1109/TPAMI.2018.2798607 10.18653/v1/D19-1016 10.18653/v1/W18-3303 10.18653/v1/P17-1081 10.18653/v1/2021.acl-long.440 10.1145/2682899 10.18653/v1/N18-2008 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TMM.2023.3260635 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1941-0077 |
EndPage | 89 |
ExternalDocumentID | 10_1109_TMM_2023_3260635 10078161 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62236005; 61876209; 61936004 funderid: 10.13039/501100001809 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193 |
IEDL.DBID | RIE |
ISSN | 1520-9210 |
IngestDate | Mon Jun 30 06:31:22 EDT 2025 Tue Jul 01 01:54:40 EDT 2025 Thu Apr 24 22:51:26 EDT 2025 Wed Aug 27 02:02:13 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-4587-3588 0000-0002-0116-5662 0000-0002-4909-8286 0009-0004-2966-5724 |
PQID | 2912942644 |
PQPubID | 75737 |
PageCount | 13 |
ParticipantIDs | ieee_primary_10078161 proquest_journals_2912942644 crossref_citationtrail_10_1109_TMM_2023_3260635 crossref_primary_10_1109_TMM_2023_3260635 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20240000 2024-00-00 20240101 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 20240000 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE transactions on multimedia |
PublicationTitleAbbrev | TMM |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 Brody (ref39) 2022 ref12 ref34 ref15 ref37 ref14 ref36 ref11 ref33 ref10 Chen (ref32) 2018 ref2 ref1 ref17 ref16 ref38 ref19 ref18 Kendall (ref41) 2018 ref24 ref23 ref26 ref20 ref21 Tsai (ref25) 2019 Jiao (ref22) 2019 ref28 ref27 ref29 ref8 ref7 Dosovitskiy (ref35) 2021 ref9 ref4 ref3 ref6 ref5 Velickovic (ref31) 2017; 1050 ref40 Hamilton (ref30) 2017 |
References_xml | – volume-title: Proc. Int. Conf. Learn. Representations year: 2021 ident: ref35 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref12 doi: 10.18653/v1/N18-1193 – ident: ref33 doi: 10.1109/CVPR.2016.90 – ident: ref38 doi: 10.1109/ICASSP40776.2020.9053896 – ident: ref15 doi: 10.1145/3394171.3413678 – start-page: 397 volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. year: 2019 ident: ref22 article-title: HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations year: 2019 ident: ref25 article-title: Learning factorized multimodal representations – ident: ref18 doi: 10.18653/v1/P19-1050 – ident: ref10 doi: 10.18653/v1/2021.acl-long.123 – ident: ref4 doi: 10.18653/v1/2021.acl-long.547 – ident: ref17 doi: 10.18653/v1/2020.emnlp-main.721 – start-page: 1 volume-title: Proc. 6th Int. Conf. Learn. Representations year: 2018 ident: ref32 article-title: FastGCN: Fast learning with graph convolutional networks via importance sampling – start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2017 ident: ref30 article-title: Inductive representation learning on large graphs – ident: ref37 doi: 10.48550/ARXIV.1706.03762 – ident: ref26 doi: 10.18653/v1/P19-1656 – ident: ref40 doi: 10.1007/s10579-008-9076-6 – ident: ref14 doi: 10.1109/ACCESS.2019.2916887 – ident: ref16 doi: 10.1609/aaai.v32i1.11604 – ident: ref36 doi: 10.1109/ICCV48922.2021.00986 – ident: ref1 doi: 10.1146/annurev-psych-020821-010855 – ident: ref27 doi: 10.1609/aaai.v34i05.6431 – ident: ref2 doi: 10.18653/v1/D19-1015 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations year: 2022 ident: ref39 article-title: How attentive are graph attention networks ? – ident: ref6 doi: 10.1109/ICASSP.2019.8682916 – ident: ref23 doi: 10.18653/v1/2020.findings-emnlp.224 – start-page: 7482 volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. year: 2018 ident: ref41 article-title: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics – ident: ref3 doi: 10.1609/aaai.v33i01.33016818 – ident: ref34 doi: 10.1109/TPAMI.2021.3074057 – ident: ref8 doi: 10.1145/3442381.3450004 – ident: ref13 doi: 10.18653/v1/d18-1280 – ident: ref7 doi: 10.18653/v1/S19-2005 – ident: ref20 doi: 10.24963/ijcai.2019/752 – ident: ref19 doi: 10.18653/v1/2020.emnlp-main.597 – ident: ref28 doi: 10.1109/TPAMI.2018.2798607 – ident: ref21 doi: 10.18653/v1/D19-1016 – ident: ref24 doi: 10.18653/v1/W18-3303 – volume: 1050 volume-title: Stat year: 2017 ident: ref31 article-title: Graph attention networks – ident: ref11 doi: 10.18653/v1/P17-1081 – ident: ref5 doi: 10.18653/v1/2021.acl-long.440 – ident: ref29 doi: 10.1145/2682899 – ident: ref9 doi: 10.18653/v1/N18-2008 |
SSID | ssj0014507 |
Score | 2.5867732 |
Snippet | Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services.... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 77 |
SubjectTerms | Acoustics Context modeling cross-modal feature complementation Emotion recognition Emotion recognition in conversation Emotions Graph neural networks Graph theory Heterogeneity Human-computer interface Machine learning Message passing multimodal fusion Oral communication Task analysis Visualization |
Title | GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition |
URI | https://ieeexplore.ieee.org/document/10078161 https://www.proquest.com/docview/2912942644 |
Volume | 26 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaACQbeiPKSBxYGp47jPMxWIgpCKgMCiS2K7csCtKi0Cwt_nbPjIB4CsUWRbSX67LvvfC9CjlWDQo-DYJBLxWSaSKYMT1itlIl1ohvwvQhG19nlnby6T-9DsrrPhQEAH3wGkXv0vnw7MXN3VdZ3Hv0idsbOIlpubbLWh8tApj43GvURZwoNmc4nyVX_djSKXJvwCLkKquT0iw7yTVV-SGKvXoZr5Lr7sDaq5CGaz3RkXr_VbPz3l6-T1UA06aDdGRtkAcabZK1r4kDDmd4kK58qEm6RtwtXwLoclqd0QFtxCJb6l_QMFZ6lpfslNppYXNzxx_kUqFs1hKE7nOkgFCqnyIipT_F98uNLF-I-fQn3j_S8bSFEb7ogpsl4m9wNz2_LSxZ6NDAjlJixvEaJmRnN61xYaTikKks11xDH2saiVolVwhZFooUCm6L9pDNA2sIlmEYhe9whS-PJGHYJlTrPE8gyG-tCJkVa67ipM5nzRtQWiV2P9DvUKhMKmLs-Go-VN2S4qhDnyuFcBZx75ORjxnNbvOOPsdsOtk_jWsR65KDbGVU43i-VUEiTPJfc-2XaPlnG1WV7WXNAlmbTORwifZnpI79t3wEjx-rj |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV2_b9QwFLZQGYChhbaIK4V6YGFwznHsJGY7ol4PaG6orlK3KLZfFuCuut4tXfqv99lxqgICsUWR7Tj67Pc--_0i5IPuUOhxEAwKqZlUmWTa8oy1WtvUZKaDUIugnuezS_n1Sl3FYPUQCwMAwfkMEv8YbPluZbf-qmzsLfpl6g87T1Hxq7QP13owGkgVoqNRI3Gm8SgzWCW5Hi_qOvGFwhNkK6iU1S9aKJRV-UMWBwUz3SPzYWq9X8n3ZLsxib39LWvjf8_9JdmNVJNO-rXxijyB5T7ZG8o40Lir98mLRzkJD8jdmU9hXU2rT3RCe4EIjoaX9DOqPEcr_0usXjkc3DPI7RqoHzU6onuk6SSmKqfIiWkI8v0Z2lfeyX19E28g6WlfRIheDG5Mq-UhuZyeLqoZi1UamBVabFjRoszMreFtIZy0HJTOleEG0tS4VLQ6c1q4ssyM0OAUnqBMDkhcuATbaeSPr8nOcrWEN4RKUxQZ5LlLTSmzUrUm7dpcFrwTrUO8R2Q8oNbYmMLcV9L40YSjDNcN4tx4nJuI84h8fOhx3afv-EfbQw_bo3Y9YiNyPKyMJm7wm0ZoJEqBTR79pdsJeTZb1OfN-Zf5t7fkOX5J9lc3x2Rns97COyQzG_M-LOF7iiruLA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GraphCFC%3A+A+Directed+Graph+Based+Cross-Modal+Feature+Complementation+Approach+for+Multimodal+Conversational+Emotion+Recognition&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Jiang&rft.au=Wang%2C+Xiaoping&rft.au=Lv%2C+Guoqing&rft.au=Zeng%2C+Zhigang&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=77&rft.epage=89&rft_id=info:doi/10.1109%2FTMM.2023.3260635&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2023_3260635 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon |