GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fie...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 26; pp. 77 - 89
Main Authors Li, Jiang, Wang, Xiaoping, Lv, Guoqing, Zeng, Zhigang
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2023.3260635

Cover

Loading…
Abstract Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
AbstractList Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.
Author Lv, Guoqing
Li, Jiang
Zeng, Zhigang
Wang, Xiaoping
Author_xml – sequence: 1
  givenname: Jiang
  orcidid: 0000-0002-0116-5662
  surname: Li
  fullname: Li, Jiang
  email: lijfrank@hust.edu.cn
  organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China
– sequence: 2
  givenname: Xiaoping
  orcidid: 0000-0002-4909-8286
  surname: Wang
  fullname: Wang, Xiaoping
  email: wangxiaoping@hust.edu.cn
  organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China
– sequence: 3
  givenname: Guoqing
  orcidid: 0009-0004-2966-5724
  surname: Lv
  fullname: Lv, Guoqing
  email: guoqinglv@hust.edu.cn
  organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China
– sequence: 4
  givenname: Zhigang
  orcidid: 0000-0003-4587-3588
  surname: Zeng
  fullname: Zeng, Zhigang
  email: zgzeng@hust.edu.cn
  organization: School of Artificial Intelligence and Automation and the Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China, Huazhong University of Science and Technology, Wuhan, China
BookMark eNp9kDFPwzAQhS1UJNrCzsBgiTnl7CROzFZCW5BaIaEyR45zpamSONgpEhN_naTtgBiY7t3pe6e7NyKD2tRIyDWDCWMg79ar1YQD9yc-FyD88IwMmQyYBxBFg06HHDzJGVyQkXM7ABaEEA3J98KqZpvMk3s6pY-FRd1iTg9D-qBcpxNrnPNWJlclnaNq9xZpYqqmxArrVrWFqem0aaxReks3xtLVvmyL6sAnpv5E6w5Q184qc8BfUZv3uuj1JTnfqNLh1amOydt8tk6evOXL4jmZLj3NJW-9SIV-LHQGKuJ5oAFDKcIMMmQsyxlX0s8lz-PYz7jEPIwFzwRCFECAeiOZ9Mfk9ri3O_Rjj65Nd2Zvu6NcyiXjMuAiCDpKHCndP21xk-ri-GJrVVGmDNI-7LQLO-3DTk9hd0b4Y2xsUSn79Z_l5mgpEPEXDlHMBPN_ANytjN0
CODEN ITMUF8
CitedBy_id crossref_primary_10_1007_s11432_023_3908_6
crossref_primary_10_4018_IJDSST_352398
crossref_primary_10_1007_s00530_024_01618_z
crossref_primary_10_1007_s12559_024_10287_z
crossref_primary_10_1109_TMM_2024_3521669
crossref_primary_10_3389_fcomp_2024_1304687
crossref_primary_10_1109_TCSS_2024_3420445
crossref_primary_10_1016_j_neucom_2024_128937
crossref_primary_10_1109_TCSS_2024_3409715
crossref_primary_10_1016_j_ins_2024_121393
crossref_primary_10_1109_ACCESS_2023_3348518
crossref_primary_10_1109_TIP_2024_3504298
crossref_primary_10_1016_j_knosys_2025_113029
crossref_primary_10_3390_electronics13132645
crossref_primary_10_1016_j_jksuci_2023_101791
crossref_primary_10_1109_TCSVT_2024_3405406
crossref_primary_10_1109_TASLP_2024_3434495
crossref_primary_10_3390_electronics13214208
Cites_doi 10.18653/v1/N18-1193
10.1109/CVPR.2016.90
10.1109/ICASSP40776.2020.9053896
10.1145/3394171.3413678
10.18653/v1/P19-1050
10.18653/v1/2021.acl-long.123
10.18653/v1/2021.acl-long.547
10.18653/v1/2020.emnlp-main.721
10.48550/ARXIV.1706.03762
10.18653/v1/P19-1656
10.1007/s10579-008-9076-6
10.1109/ACCESS.2019.2916887
10.1609/aaai.v32i1.11604
10.1109/ICCV48922.2021.00986
10.1146/annurev-psych-020821-010855
10.1609/aaai.v34i05.6431
10.18653/v1/D19-1015
10.1109/ICASSP.2019.8682916
10.18653/v1/2020.findings-emnlp.224
10.1609/aaai.v33i01.33016818
10.1109/TPAMI.2021.3074057
10.1145/3442381.3450004
10.18653/v1/d18-1280
10.18653/v1/S19-2005
10.24963/ijcai.2019/752
10.18653/v1/2020.emnlp-main.597
10.1109/TPAMI.2018.2798607
10.18653/v1/D19-1016
10.18653/v1/W18-3303
10.18653/v1/P17-1081
10.18653/v1/2021.acl-long.440
10.1145/2682899
10.18653/v1/N18-2008
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TMM.2023.3260635
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0077
EndPage 89
ExternalDocumentID 10_1109_TMM_2023_3260635
10078161
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62236005; 61876209; 61936004
  funderid: 10.13039/501100001809
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
VH1
ZY4
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193
IEDL.DBID RIE
ISSN 1520-9210
IngestDate Mon Jun 30 06:31:22 EDT 2025
Tue Jul 01 01:54:40 EDT 2025
Thu Apr 24 22:51:26 EDT 2025
Wed Aug 27 02:02:13 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c292t-7a5386cb0a72d4c0e5965b0be11bd12a93d92d883b29ed5862b6e07404ecf9193
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-4587-3588
0000-0002-0116-5662
0000-0002-4909-8286
0009-0004-2966-5724
PQID 2912942644
PQPubID 75737
PageCount 13
ParticipantIDs ieee_primary_10078161
proquest_journals_2912942644
crossref_citationtrail_10_1109_TMM_2023_3260635
crossref_primary_10_1109_TMM_2023_3260635
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240000
2024-00-00
20240101
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 20240000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on multimedia
PublicationTitleAbbrev TMM
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
Brody (ref39) 2022
ref12
ref34
ref15
ref37
ref14
ref36
ref11
ref33
ref10
Chen (ref32) 2018
ref2
ref1
ref17
ref16
ref38
ref19
ref18
Kendall (ref41) 2018
ref24
ref23
ref26
ref20
ref21
Tsai (ref25) 2019
Jiao (ref22) 2019
ref28
ref27
ref29
ref8
ref7
Dosovitskiy (ref35) 2021
ref9
ref4
ref3
ref6
ref5
Velickovic (ref31) 2017; 1050
ref40
Hamilton (ref30) 2017
References_xml – volume-title: Proc. Int. Conf. Learn. Representations
  year: 2021
  ident: ref35
  article-title: An image is worth 16x16 words: Transformers for image recognition at scale
– ident: ref12
  doi: 10.18653/v1/N18-1193
– ident: ref33
  doi: 10.1109/CVPR.2016.90
– ident: ref38
  doi: 10.1109/ICASSP40776.2020.9053896
– ident: ref15
  doi: 10.1145/3394171.3413678
– start-page: 397
  volume-title: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol.
  year: 2019
  ident: ref22
  article-title: HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Representations
  year: 2019
  ident: ref25
  article-title: Learning factorized multimodal representations
– ident: ref18
  doi: 10.18653/v1/P19-1050
– ident: ref10
  doi: 10.18653/v1/2021.acl-long.123
– ident: ref4
  doi: 10.18653/v1/2021.acl-long.547
– ident: ref17
  doi: 10.18653/v1/2020.emnlp-main.721
– start-page: 1
  volume-title: Proc. 6th Int. Conf. Learn. Representations
  year: 2018
  ident: ref32
  article-title: FastGCN: Fast learning with graph convolutional networks via importance sampling
– start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2017
  ident: ref30
  article-title: Inductive representation learning on large graphs
– ident: ref37
  doi: 10.48550/ARXIV.1706.03762
– ident: ref26
  doi: 10.18653/v1/P19-1656
– ident: ref40
  doi: 10.1007/s10579-008-9076-6
– ident: ref14
  doi: 10.1109/ACCESS.2019.2916887
– ident: ref16
  doi: 10.1609/aaai.v32i1.11604
– ident: ref36
  doi: 10.1109/ICCV48922.2021.00986
– ident: ref1
  doi: 10.1146/annurev-psych-020821-010855
– ident: ref27
  doi: 10.1609/aaai.v34i05.6431
– ident: ref2
  doi: 10.18653/v1/D19-1015
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Representations
  year: 2022
  ident: ref39
  article-title: How attentive are graph attention networks ?
– ident: ref6
  doi: 10.1109/ICASSP.2019.8682916
– ident: ref23
  doi: 10.18653/v1/2020.findings-emnlp.224
– start-page: 7482
  volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
  year: 2018
  ident: ref41
  article-title: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
– ident: ref3
  doi: 10.1609/aaai.v33i01.33016818
– ident: ref34
  doi: 10.1109/TPAMI.2021.3074057
– ident: ref8
  doi: 10.1145/3442381.3450004
– ident: ref13
  doi: 10.18653/v1/d18-1280
– ident: ref7
  doi: 10.18653/v1/S19-2005
– ident: ref20
  doi: 10.24963/ijcai.2019/752
– ident: ref19
  doi: 10.18653/v1/2020.emnlp-main.597
– ident: ref28
  doi: 10.1109/TPAMI.2018.2798607
– ident: ref21
  doi: 10.18653/v1/D19-1016
– ident: ref24
  doi: 10.18653/v1/W18-3303
– volume: 1050
  volume-title: Stat
  year: 2017
  ident: ref31
  article-title: Graph attention networks
– ident: ref11
  doi: 10.18653/v1/P17-1081
– ident: ref5
  doi: 10.18653/v1/2021.acl-long.440
– ident: ref29
  doi: 10.1145/2682899
– ident: ref9
  doi: 10.18653/v1/N18-2008
SSID ssj0014507
Score 2.5867732
Snippet Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 77
SubjectTerms Acoustics
Context modeling
cross-modal feature complementation
Emotion recognition
Emotion recognition in conversation
Emotions
Graph neural networks
Graph theory
Heterogeneity
Human-computer interface
Machine learning
Message passing
multimodal fusion
Oral communication
Task analysis
Visualization
Title GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition
URI https://ieeexplore.ieee.org/document/10078161
https://www.proquest.com/docview/2912942644
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaACQbeiPKSBxYGp47jPMxWIgpCKgMCiS2K7csCtKi0Cwt_nbPjIB4CsUWRbSX67LvvfC9CjlWDQo-DYJBLxWSaSKYMT1itlIl1ohvwvQhG19nlnby6T-9DsrrPhQEAH3wGkXv0vnw7MXN3VdZ3Hv0idsbOIlpubbLWh8tApj43GvURZwoNmc4nyVX_djSKXJvwCLkKquT0iw7yTVV-SGKvXoZr5Lr7sDaq5CGaz3RkXr_VbPz3l6-T1UA06aDdGRtkAcabZK1r4kDDmd4kK58qEm6RtwtXwLoclqd0QFtxCJb6l_QMFZ6lpfslNppYXNzxx_kUqFs1hKE7nOkgFCqnyIipT_F98uNLF-I-fQn3j_S8bSFEb7ogpsl4m9wNz2_LSxZ6NDAjlJixvEaJmRnN61xYaTikKks11xDH2saiVolVwhZFooUCm6L9pDNA2sIlmEYhe9whS-PJGHYJlTrPE8gyG-tCJkVa67ipM5nzRtQWiV2P9DvUKhMKmLs-Go-VN2S4qhDnyuFcBZx75ORjxnNbvOOPsdsOtk_jWsR65KDbGVU43i-VUEiTPJfc-2XaPlnG1WV7WXNAlmbTORwifZnpI79t3wEjx-rj
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV2_b9QwFLZQGYChhbaIK4V6YGFwznHsJGY7ol4PaG6orlK3KLZfFuCuut4tXfqv99lxqgICsUWR7Tj67Pc--_0i5IPuUOhxEAwKqZlUmWTa8oy1WtvUZKaDUIugnuezS_n1Sl3FYPUQCwMAwfkMEv8YbPluZbf-qmzsLfpl6g87T1Hxq7QP13owGkgVoqNRI3Gm8SgzWCW5Hi_qOvGFwhNkK6iU1S9aKJRV-UMWBwUz3SPzYWq9X8n3ZLsxib39LWvjf8_9JdmNVJNO-rXxijyB5T7ZG8o40Lir98mLRzkJD8jdmU9hXU2rT3RCe4EIjoaX9DOqPEcr_0usXjkc3DPI7RqoHzU6onuk6SSmKqfIiWkI8v0Z2lfeyX19E28g6WlfRIheDG5Mq-UhuZyeLqoZi1UamBVabFjRoszMreFtIZy0HJTOleEG0tS4VLQ6c1q4ssyM0OAUnqBMDkhcuATbaeSPr8nOcrWEN4RKUxQZ5LlLTSmzUrUm7dpcFrwTrUO8R2Q8oNbYmMLcV9L40YSjDNcN4tx4nJuI84h8fOhx3afv-EfbQw_bo3Y9YiNyPKyMJm7wm0ZoJEqBTR79pdsJeTZb1OfN-Zf5t7fkOX5J9lc3x2Rns97COyQzG_M-LOF7iiruLA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GraphCFC%3A+A+Directed+Graph+Based+Cross-Modal+Feature+Complementation+Approach+for+Multimodal+Conversational+Emotion+Recognition&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Li%2C+Jiang&rft.au=Wang%2C+Xiaoping&rft.au=Lv%2C+Guoqing&rft.au=Zeng%2C+Zhigang&rft.date=2024&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=26&rft.spage=77&rft.epage=89&rft_id=info:doi/10.1109%2FTMM.2023.3260635&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2023_3260635
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon