Emotion-Aware Talking Face Generation Based on 3DMM

Current methods for generating videos of talking face based on deep learning mainly focus on the correlation between lip movements and audio content. Although these methods have high generation quality and good audio-visual synchronization, they ignore facial expressions in talking face videos. To s...

Full description

Saved in:

Bibliographic Details
Published in	2024 4th International Conference on Neural Networks, Information and Communication (NNICE) pp. 1808 - 1813
Main Authors	Chen, Xinyu, Tang, Sheng
Format	Conference Proceeding
Language	English
Published	IEEE 19.01.2024
Subjects	3DMM Aerospace electronics content information Deep learning emotion information facial expressions Feature extraction lip movements Lips Quality assessment Three-dimensional displays Transformer Transformers
Online Access	Get full text
DOI	10.1109/NNICE61279.2024.10498924

Cover

Loading…

Abstract	Current methods for generating videos of talking face based on deep learning mainly focus on the correlation between lip movements and audio content. Although these methods have high generation quality and good audio-visual synchronization, they ignore facial expressions in talking face videos. To solve this problem, Audio to Expression Network (A2ENet), an emotional talking face video generation framework based on 3DMM, is proposed in this paper to generate talking face videos with facial expressions in an audio-driven way. Firstly, A2ENet uses two Transformer based encoders to extract audio features, and uses a cross-reconstruction emotion disentanglement method to decompose audio into potential space of content information and potential space of emotion information, and then uses a Transformer Decoder to integrate these two feature spaces. After that, the Proposed method predict the 3D expression coefficient that matches the emotion of the audio, and finally uses the renderer to generate the talking face video. By using the eye control parameters, A2ENet can realize the eye movements control of talking face. A2ENet associating the initial 3D expression coefficients with specific individuals to retain the identity information of the reference face. Experimental results show that our method can generate talking face videos with appropriate facial expressions, and achieve more accurate lip movements and better video quality.
AbstractList	Current methods for generating videos of talking face based on deep learning mainly focus on the correlation between lip movements and audio content. Although these methods have high generation quality and good audio-visual synchronization, they ignore facial expressions in talking face videos. To solve this problem, Audio to Expression Network (A2ENet), an emotional talking face video generation framework based on 3DMM, is proposed in this paper to generate talking face videos with facial expressions in an audio-driven way. Firstly, A2ENet uses two Transformer based encoders to extract audio features, and uses a cross-reconstruction emotion disentanglement method to decompose audio into potential space of content information and potential space of emotion information, and then uses a Transformer Decoder to integrate these two feature spaces. After that, the Proposed method predict the 3D expression coefficient that matches the emotion of the audio, and finally uses the renderer to generate the talking face video. By using the eye control parameters, A2ENet can realize the eye movements control of talking face. A2ENet associating the initial 3D expression coefficients with specific individuals to retain the identity information of the reference face. Experimental results show that our method can generate talking face videos with appropriate facial expressions, and achieve more accurate lip movements and better video quality.
Author	Tang, Sheng Chen, Xinyu
Author_xml	– sequence: 1 givenname: Xinyu surname: Chen fullname: Chen, Xinyu email: cxy808@gs.zzu.edu.cn organization: Zhengzhou University,Henan Institute of Advanced Technology,Zhengzhou,China – sequence: 2 givenname: Sheng surname: Tang fullname: Tang, Sheng email: ts@ict.ac.cn organization: Institute of Computing Technology,Chinese Academy of Sciences,Beijing,China
BookMark	eNo1j81OwkAUhcdEFoq8AYt5gdZ756czd4m1IAngBtfkQu-YRmhNITG-vTXK6jvJSb6cc69u264VpTRCjgj0uNksy6pAEyg3YFyO4CiScTdqQoGi9WDJ2eDvlK1O3aXp2mz2xb3oLR8_mvZdz_kgeiGt9Pzb6ic-S62HYJ_X6wc1Snw8y-SfY_U2r7blS7Z6XSzL2SprEOmS7QmYOdraJgy-cABSGIrDLAcDaycJCX0KPiZGjrgHsIHcwZuCMXk7VtM_byMiu8--OXH_vbt-sT9oF0Dz
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/NNICE61279.2024.10498924
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350394375
EndPage	1813
ExternalDocumentID	10498924
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i119t-b90aaa83d3f1756400e629861240298d4ef1915f758fa1a81b003794c526a1f53
IEDL.DBID	RIE
IngestDate	Wed May 01 11:58:52 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i119t-b90aaa83d3f1756400e629861240298d4ef1915f758fa1a81b003794c526a1f53
PageCount	6
ParticipantIDs	ieee_primary_10498924
PublicationCentury	2000
PublicationDate	2024-Jan.-19
PublicationDateYYYYMMDD	2024-01-19
PublicationDate_xml	– month: 01 year: 2024 text: 2024-Jan.-19 day: 19
PublicationDecade	2020
PublicationTitle	2024 4th International Conference on Neural Networks, Information and Communication (NNICE)
PublicationTitleAbbrev	NNICE
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8678215
Snippet	Current methods for generating videos of talking face based on deep learning mainly focus on the correlation between lip movements and audio content. Although...
SourceID	ieee
SourceType	Publisher
StartPage	1808
SubjectTerms	3DMM Aerospace electronics content information Deep learning emotion information facial expressions Feature extraction lip movements Lips Quality assessment Three-dimensional displays Transformer Transformers
Title	Emotion-Aware Talking Face Generation Based on 3DMM
URI	https://ieeexplore.ieee.org/document/10498924
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sT55UjPgmB6-JeWy2u0etDVVo8NBCb2WzDxAllZJQ8Nc7kzSKguBt2SxkZ8PyzUzm-wbgRjouhaPqBkSTgBGJS5jEBSJV6HxjSBGrVu2z4NMFe1pmyx1ZveXCWGvb4jMb0rD9l2_WuqFUGd5wJgUGDAMYYOTWkbX66pxI3hbF43iCiD0iAkrCwn75j8YpLW7kB1D0b-zKRV7Dpi5D_fFLjPHfWzoE75ui5z9_gc8R7NnqGNJJ15QnuNuqjfXn6o0S4X6ucHGnL01P_XtELuPjIH2YzTxY5JP5eBrsuiIEL3Es66CUkVJKpCZ1CP0c76DliRRoNyM5dcOswxgscxgIOBUrdEtJY0YynSVcxS5LT2BYrSt7Cn5kdKYk15qTjlkWlS7RmiWlsSNydNgZeGTx6r0Tvlj1xp7_MX8B-3TwlKGI5SUM601jrxCz6_K6_VafCdiTcw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB60HvSkYsW3OXhNzGOzzR61tqTaBA8t9FY2-wBR2lISBH-9M0mjKAjelmTDZliWb2Z2vm8AboTlIrFU3YBo4jIicSU6tG4SSXS-MaQIZK32mfN0yh5n8WxDVq-5MMaYuvjMeDSs7_L1UlWUKsMTzkSCAcM27CDwM9HQtdr6HF_c5vmoP0DM7hEFJWRe-8GP1ik1cgz3IW_XbApGXr2qLDz18UuO8d8_dQDdb5Ke8_wFP4ewZRZHEA2atjzu3btcG2ci3ygV7gwlTm4Upumtc4_YpR0cRA9Z1oXpcDDpp-6mL4L7EgSidAvhSymTSEcWwZ_jKTQ8FAnazUhQXTNjMQqLLYYCVgYSHVNSmRFMxSGXgY2jY-gslgtzAo6vVSwFV4qTklnsFzZUioWFNj1yddgpdMni-aqRvpi3xp798fwadtNJNp6PR_nTOezRJlC-IhAX0CnXlblEBC-Lq3rfPgH-KZbD
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+4th+International+Conference+on+Neural+Networks%2C+Information+and+Communication+%28NNICE%29&rft.atitle=Emotion-Aware+Talking+Face+Generation+Based+on+3DMM&rft.au=Chen%2C+Xinyu&rft.au=Tang%2C+Sheng&rft.date=2024-01-19&rft.pub=IEEE&rft.spage=1808&rft.epage=1813&rft_id=info:doi/10.1109%2FNNICE61279.2024.10498924&rft.externalDocID=10498924