GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fie...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 26; pp. 77 - 89
Main Authors	Li, Jiang, Wang, Xiaoping, Lv, Guoqing, Zeng, Zhigang
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Context modeling cross-modal feature complementation Emotion recognition Emotion recognition in conversation Emotions Graph neural networks Graph theory Heterogeneity Human-computer interface Machine learning Message passing multimodal fusion Oral communication Task analysis Visualization
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2023.3260635

Cover

Loading…

Abstract	Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
AbstractList	Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
Author	Lv, Guoqing Li, Jiang Zeng, Zhigang Wang, Xiaoping
Author_xml	– sequence: 1 givenname: Jiang orcidid: 0000-0002-0116-5662 surname: Li fullname: Li, Jiang email: lijfrank@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 2 givenname: Xiaoping orcidid: 0000-0002-4909-8286 surname: Wang fullname: Wang, Xiaoping email: wangxiaoping@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 3 givenname: Guoqing orcidid: 0009-0004-2966-5724 surname: Lv fullname: Lv, Guoqing email: guoqinglv@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China – sequence: 4 givenname: Zhigang orcidid: 0000-0003-4587-3588 surname: Zeng fullname: Zeng, Zhigang email: zgzeng@hust.edu.cn organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China
BookMark	eNp9kDFPwzAQhS1UJNrCzsBgiTnl7CROzFZCW5BaIaEyR45zpamSONgpEhN_naTtgBiY7t3pe6e7NyKD2tRIyDWDCWMg79ar1YQD9yc-FyD88IwMmQyYBxBFg06HHDzJGVyQkXM7ABaEEA3J98KqZpvMk3s6pY-FRd1iTg9D-qBcpxNrnPNWJlclnaNq9xZpYqqmxArrVrWFqem0aaxReks3xtLVvmyL6sAnpv5E6w5Q184qc8BfUZv3uuj1JTnfqNLh1amOydt8tk6evOXL4jmZLj3NJW-9SIV-LHQGKuJ5oAFDKcIMMmQsyxlX0s8lz-PYz7jEPIwFzwRCFECAeiOZ9Mfk9ri3O_Rjj65Nd2Zvu6NcyiXjMuAiCDpKHCndP21xk-ri-GJrVVGmDNI-7LQLO-3DTk9hd0b4Y2xsUSn79Z_l5mgpEPEXDlHMBPN_ANytjN0
CODEN	ITMUF8
CitedBy_id	crossref_primary_10_1007_s11432_023_3908_6 crossref_primary_10_4018_IJDSST_352398 crossref_primary_10_1007_s00530_024_01618_z crossref_primary_10_1007_s12559_024_10287_z crossref_primary_10_1109_TMM_2024_3521669 crossref_primary_10_3389_fcomp_2024_1304687 crossref_primary_10_1109_TCSS_2024_3420445 crossref_primary_10_1016_j_neucom_2024_128937 crossref_primary_10_1109_TCSS_2024_3409715 crossref_primary_10_1016_j_ins_2024_121393 crossref_primary_10_1109_ACCESS_2023_3348518 crossref_primary_10_1109_TIP_2024_3504298 crossref_primary_10_1016_j_knosys_2025_113029 crossref_primary_10_3390_electronics13132645 crossref_primary_10_1016_j_jksuci_2023_101791 crossref_primary_10_1109_TCSVT_2024_3405406 crossref_primary_10_1109_TASLP_2024_3434495 crossref_primary_10_3390_electronics13214208
Cites_doi	10.18653/v1/N18-1193 10.1109/CVPR.2016.90 10.1109/ICASSP40776.2020.9053896 10.1145/3394171.3413678 10.18653/v1/P19-1050 10.18653/v1/2021.acl-long.123 10.18653/v1/2021.acl-long.547 10.18653/v1/2020.emnlp-main.721 10.48550/ARXIV.1706.03762 10.18653/v1/P19-1656 10.1007/s10579-008-9076-6 10.1109/ACCESS.2019.2916887 10.1609/aaai.v32i1.11604 10.1109/ICCV48922.2021.00986 10.1146/annurev-psych-020821-010855 10.1609/aaai.v34i05.6431 10.18653/v1/D19-1015 10.1109/ICASSP.2019.8682916 10.18653/v1/2020.findings-emnlp.224 10.1609/aaai.v33i01.33016818 10.1109/TPAMI.2021.3074057 10.1145/3442381.3450004 10.18653/v1/d18-1280 10.18653/v1/S19-2005 10.24963/ijcai.2019/752 10.18653/v1/2020.emnlp-main.597 10.1109/TPAMI.2018.2798607 10.18653/v1/D19-1016 10.18653/v1/W18-3303 10.18653/v1/P17-1081 10.18653/v1/2021.acl-long.440 10.1145/2682899 10.18653/v1/N18-2008
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TMM.2023.3260635
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1941-0077
EndPage	89
ExternalDocumentID	10_1109_TMM_2023_3260635 10078161
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62236005; 61876209; 61936004 funderid: 10.13039/501100001809
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193
IEDL.DBID	RIE
ISSN	1520-9210
IngestDate	Mon Jun 30 06:31:22 EDT 2025 Tue Jul 01 01:54:40 EDT 2025 Thu Apr 24 22:51:26 EDT 2025 Wed Aug 27 02:02:13 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0003-4587-3588 0000-0002-0116-5662 0000-0002-4909-8286 0009-0004-2966-5724
PQID	2912942644
PQPubID	75737
PageCount	13
ParticipantIDs	ieee_primary_10078161 proquest_journals_2912942644 crossref_citationtrail_10_1109_TMM_2023_3260635 crossref_primary_10_1109_TMM_2023_3260635
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20240000 2024-00-00 20240101
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 20240000
PublicationDecade	2020
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE transactions on multimedia
PublicationTitleAbbrev	TMM
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 Brody (ref39) 2022 ref12 ref34 ref15 ref37 ref14 ref36 ref11 ref33 ref10 Chen (ref32) 2018 ref2 ref1 ref17 ref16 ref38 ref19 ref18 Kendall (ref41) 2018 ref24 ref23 ref26 ref20 ref21 Tsai (ref25) 2019 Jiao (ref22) 2019 ref28 ref27 ref29 ref8 ref7 Dosovitskiy (ref35) 2021 ref9 ref4 ref3 ref6 ref5 Velickovic (ref31) 2017; 1050 ref40 Hamilton (ref30) 2017
References_xml	– volume-title: Proc. Int. Conf. Learn. Representations year: 2021 ident: ref35 article-title: An image is worth 16x16 words: Transformers for image recognition at scale – ident: ref12 doi: 10.18653/v1/N18-1193 – ident: ref33 doi: 10.1109/CVPR.2016.90 – ident: ref38 doi: 10.1109/ICASSP40776.2020.9053896 – ident: ref15 doi: 10.1145/3394171.3413678 – start-page: 397 volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. year: 2019 ident: ref22 article-title: HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations year: 2019 ident: ref25 article-title: Learning factorized multimodal representations – ident: ref18 doi: 10.18653/v1/P19-1050 – ident: ref10 doi: 10.18653/v1/2021.acl-long.123 – ident: ref4 doi: 10.18653/v1/2021.acl-long.547 – ident: ref17 doi: 10.18653/v1/2020.emnlp-main.721 – start-page: 1 volume-title: Proc. 6th Int. Conf. Learn. Representations year: 2018 ident: ref32 article-title: FastGCN: Fast learning with graph convolutional networks via importance sampling – start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2017 ident: ref30 article-title: Inductive representation learning on large graphs – ident: ref37 doi: 10.48550/ARXIV.1706.03762 – ident: ref26 doi: 10.18653/v1/P19-1656 – ident: ref40 doi: 10.1007/s10579-008-9076-6 – ident: ref14 doi: 10.1109/ACCESS.2019.2916887 – ident: ref16 doi: 10.1609/aaai.v32i1.11604 – ident: ref36 doi: 10.1109/ICCV48922.2021.00986 – ident: ref1 doi: 10.1146/annurev-psych-020821-010855 – ident: ref27 doi: 10.1609/aaai.v34i05.6431 – ident: ref2 doi: 10.18653/v1/D19-1015 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations year: 2022 ident: ref39 article-title: How attentive are graph attention networks ? – ident: ref6 doi: 10.1109/ICASSP.2019.8682916 – ident: ref23 doi: 10.18653/v1/2020.findings-emnlp.224 – start-page: 7482 volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. year: 2018 ident: ref41 article-title: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics – ident: ref3 doi: 10.1609/aaai.v33i01.33016818 – ident: ref34 doi: 10.1109/TPAMI.2021.3074057 – ident: ref8 doi: 10.1145/3442381.3450004 – ident: ref13 doi: 10.18653/v1/d18-1280 – ident: ref7 doi: 10.18653/v1/S19-2005 – ident: ref20 doi: 10.24963/ijcai.2019/752 – ident: ref19 doi: 10.18653/v1/2020.emnlp-main.597 – ident: ref28 doi: 10.1109/TPAMI.2018.2798607 – ident: ref21 doi: 10.18653/v1/D19-1016 – ident: ref24 doi: 10.18653/v1/W18-3303 – volume: 1050 volume-title: Stat year: 2017 ident: ref31 article-title: Graph attention networks – ident: ref11 doi: 10.18653/v1/P17-1081 – ident: ref5 doi: 10.18653/v1/2021.acl-long.440 – ident: ref29 doi: 10.1145/2682899 – ident: ref9 doi: 10.18653/v1/N18-2008
SSID	ssj0014507
Score	2.5867732
Snippet	Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services....
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	77
SubjectTerms	Acoustics Context modeling cross-modal feature complementation Emotion recognition Emotion recognition in conversation Emotions Graph neural networks Graph theory Heterogeneity Human-computer interface Machine learning Message passing multimodal fusion Oral communication Task analysis Visualization
Title	GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition
URI	https://ieeexplore.ieee.org/document/10078161 https://www.proquest.com/docview/2912942644
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaACQbeiPKSBxYGp47jPMxWIgpCKgMCiS2K7csCtKi0Cwt_nbPjIB4CsUWRbSX67LvvfC9CjlWDQo-DYJBLxWSaSKYMT1itlIl1ohvwvQhG19nlnby6T-9DsrrPhQEAH3wGkXv0vnw7MXN3VdZ3Hv0idsbOIlpubbLWh8tApj43GvURZwoNmc4nyVX_djSKXJvwCLkKquT0iw7yTVV-SGKvXoZr5Lr7sDaq5CGaz3RkXr_VbPz3l6-T1UA06aDdGRtkAcabZK1r4kDDmd4kK58qEm6RtwtXwLoclqd0QFtxCJb6l_QMFZ6lpfslNppYXNzxx_kUqFs1hKE7nOkgFCqnyIipT_F98uNLF-I-fQn3j_S8bSFEb7ogpsl4m9wNz2_LSxZ6NDAjlJixvEaJmRnN61xYaTikKks11xDH2saiVolVwhZFooUCm6L9pDNA2sIlmEYhe9whS-PJGHYJlTrPE8gyG-tCJkVa67ipM5nzRtQWiV2P9DvUKhMKmLs-Go-VN2S4qhDnyuFcBZx75ORjxnNbvOOPsdsOtk_jWsR65KDbGVU43i-VUEiTPJfc-2XaPlnG1WV7WXNAlmbTORwifZnpI79t3wEjx-rj
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV2_b9QwFLZQGYChhbaIK4V6YGFwznHsJGY7ol4PaG6orlK3KLZfFuCuut4tXfqv99lxqgICsUWR7Tj67Pc--_0i5IPuUOhxEAwKqZlUmWTa8oy1WtvUZKaDUIugnuezS_n1Sl3FYPUQCwMAwfkMEv8YbPluZbf-qmzsLfpl6g87T1Hxq7QP13owGkgVoqNRI3Gm8SgzWCW5Hi_qOvGFwhNkK6iU1S9aKJRV-UMWBwUz3SPzYWq9X8n3ZLsxib39LWvjf8_9JdmNVJNO-rXxijyB5T7ZG8o40Lir98mLRzkJD8jdmU9hXU2rT3RCe4EIjoaX9DOqPEcr_0usXjkc3DPI7RqoHzU6onuk6SSmKqfIiWkI8v0Z2lfeyX19E28g6WlfRIheDG5Mq-UhuZyeLqoZi1UamBVabFjRoszMreFtIZy0HJTOleEG0tS4VLQ6c1q4ssyM0OAUnqBMDkhcuATbaeSPr8nOcrWEN4RKUxQZ5LlLTSmzUrUm7dpcFrwTrUO8R2Q8oNbYmMLcV9L40YSjDNcN4tx4nJuI84h8fOhx3afv-EfbQw_bo3Y9YiNyPKyMJm7wm0ZoJEqBTR79pdsJeTZb1OfN-Zf5t7fkOX5J9lc3x2Rns97COyQzG_M-LOF7iiruLA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GraphCFC%3A+A+Directed+Graph+Based+Cross-Modal+Feature+Complementation+Approach+for+Multimodal+Conversational+Emotion+Recognition&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Jiang&rft.au=Wang%2C+Xiaoping&rft.au=Lv%2C+Guoqing&rft.au=Zeng%2C+Zhigang&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=77&rft.epage=89&rft_id=info:doi/10.1109%2FTMM.2023.3260635&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2023_3260635
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon