Multi-Modality Deep Restoration of Extremely Compressed Face Videos

Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression ar...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 2; pp. 2024 - 2037
Main Authors Zhang, Xi, Wu, Xiaolin
Format Journal Article
LanguageEnglish
Published United States IEEE 01.02.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression artifacts in talking head videos are inevitable. The resulting video quality degradation is highly visible and objectionable due to high acuity of human visual system to faces. To solve this problem, we develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed. The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities: the video-synchronized speech signal and semantic elements of the compression code stream, including motion vectors, code partition map and quantization parameters. These priors strongly correlate with the latent video and hence they are able to enhance the capability of deep learning to remove compression artifacts. Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos over the existing state-of-the-art methods.
AbstractList Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression artifacts in talking head videos are inevitable. The resulting video quality degradation is highly visible and objectionable due to high acuity of human visual system to faces. To solve this problem, we develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed. The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities: the video-synchronized speech signal and semantic elements of the compression code stream, including motion vectors, code partition map and quantization parameters. These priors strongly correlate with the latent video and hence they are able to enhance the capability of deep learning to remove compression artifacts. Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos over the existing state-of-the-art methods.
Author Wu, Xiaolin
Zhang, Xi
Author_xml – sequence: 1
  givenname: Xi
  orcidid: 0000-0002-1993-6031
  surname: Zhang
  fullname: Zhang, Xi
  email: zhangxi_19930818@sjtu.edu.cn
  organization: Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
– sequence: 2
  givenname: Xiaolin
  orcidid: 0000-0002-0103-5374
  surname: Wu
  fullname: Wu, Xiaolin
  email: xwu@ece.mcmaster.ca
  organization: Department of Electrical & Computer Engineering, McMaster University, Hamilton, Ontario, Canada
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35259095$$D View this record in MEDLINE/PubMed
BookMark eNpdkMFKxDAURYMoOo7-gIIU3LjpmOQ1TbOU6qgwgyKj25A2r1BpmzFpwfl7qzO6cPUW79zL5RyT_c51SMgZozPGqLpePd8sH2eccj4DJiRk2R6ZMAUqBgFqn0woS3mcZTw7IschvFPKEkHhkByB4EJRJSYkXw5NX8dLZ01T95voFnEdvWDonTd97brIVdHdZ--xxWYT5a5dewwBbTQ3JUZvtUUXTshBZZqAp7s7Ja_zu1X-EC-e7h_zm0VcgmB9bMtxQCaMlDapLAhMkkQWDJVKrDJVYSsuZZZybtECKCgNVtQIC4qOn6KEKbna9q69-xjGjbqtQ4lNYzp0Q9A8BSkylqZ0RC__oe9u8N24TnOZMpBJqpKR4luq9C4Ej5Ve-7o1fqMZ1d-K9Y9i_a1Y7xSPoYtd9VC0aP8iv05H4HwL1Ij491YSKBUAX1dLgLs
CODEN ITPIDJ
CitedBy_id crossref_primary_10_1587_transfun_2022EAL2039
crossref_primary_10_1109_TMM_2023_3264882
Cites_doi 10.1007/978-3-319-10605-2_12
10.1109/CVPR42600.2020.00853
10.1109/CVPR42600.2020.00360
10.1109/ICCV.2019.00714
10.1109/ICCV.2017.89
10.1109/ICCV.2017.36
10.1109/TSP.2013.2290508
10.1007/s11263-018-01144-2
10.1109/ICCV.2019.00355
10.1109/CVPR.2019.00802
10.1145/311535.311537
10.1109/CVPR.2019.01126
10.1109/TIP.2016.2558825
10.1109/CVPR46437.2021.00416
10.1109/TIP.2020.3040074
10.1007/978-3-030-01261-8_41
10.1109/TCSVT.2003.815165
10.1145/3197517.3201292
10.1109/CVPR.2017.180
10.1109/TIP.2012.2202672
10.1109/ICCV.2019.00713
10.1007/978-3-030-58536-5_12
10.1145/3414685.3417774
10.1109/ICCV.2019.00652
10.1007/978-3-540-30543-9_71
10.1609/aaai.v33i01.33019299
10.1109/ICIP.2017.8296933
10.1109/CVPR.2018.00862
10.1007/978-3-030-01261-8_17
10.21437/Interspeech.2018-1929
10.1145/3394171.3413709
10.1109/TIP.2013.2274386
10.1109/CVPR.2017.517
10.1007/978-3-319-93764-9_35
10.1109/CVPRW.2019.00247
10.1109/CVPRW.2017.252
10.1109/ICCV.2019.00955
10.1109/ICCV.2017.517
10.1109/CVPR.2018.00264
10.21437/Interspeech.2017-950
10.1109/ICIP.2019.8803398
10.1007/978-1-4842-2766-4_12
10.1145/3072959.3073640
10.1109/CVPR42600.2020.01235
10.1109/TMM.2019.2962310
10.1109/ICME.2017.8019299
10.1109/TCSVT.2012.2221191
10.1109/DCC.2019.00011
10.1109/ICIP.2011.6115865
10.1109/CVPR42600.2020.00342
10.1109/76.554415
10.1109/CVPR.2018.00697
10.1109/TIP.2007.891788
10.1109/ICIP.2018.8451086
10.1109/ICCV.2017.187
10.1109/ICCV.2015.73
10.1109/CVPR.2018.00101
10.1007/978-3-030-01240-3_14
10.1007/978-3-030-01264-9_35
10.1109/TIP.2016.2526910
10.1007/978-3-319-46454-1_37
10.1109/TASLP.2019.2947741
10.1007/978-3-030-58517-4_42
10.1016/j.specom.2011.11.004
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
NPM
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TPAMI.2022.3157388
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
PubMed
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle PubMed
CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList Technology Research Database
PubMed

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1939-3539
2160-9292
EndPage 2037
ExternalDocumentID 10_1109_TPAMI_2022_3157388
35259095
9730053
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
– fundername: Natural Sciences and Engineering Research Council of Canada
  funderid: 10.13039/501100000038
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIC
RIE
RIG
RNS
RXW
TAE
TN5
UHB
~02
5VS
9M8
AAYOK
ABFSI
ADRHT
AETIX
AI.
AIBXA
ALLEH
F20
FA8
H~9
IBMZZ
ICLAB
IFJZH
NPM
RNI
RZB
VH1
XJT
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c351t-dc00185a77d4fd35e4447b1e994d9afbdf2778622ded3393caef0a5d390f27bc3
IEDL.DBID RIE
ISSN 0162-8828
IngestDate Thu Jul 25 07:40:59 EDT 2024
Thu Oct 10 18:45:27 EDT 2024
Fri Aug 23 01:56:07 EDT 2024
Sat Sep 28 08:21:00 EDT 2024
Wed Jun 26 19:25:48 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-dc00185a77d4fd35e4447b1e994d9afbdf2778622ded3393caef0a5d390f27bc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-1993-6031
0000-0002-0103-5374
PMID 35259095
PQID 2761374694
PQPubID 85458
PageCount 14
ParticipantIDs crossref_primary_10_1109_TPAMI_2022_3157388
proquest_journals_2761374694
proquest_miscellaneous_2637581660
pubmed_primary_35259095
ieee_primary_9730053
PublicationCentury 2000
PublicationDate 2023-02-01
PublicationDateYYYYMMDD 2023-02-01
PublicationDate_xml – month: 02
  year: 2023
  text: 2023-02-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationTitleAlternate IEEE Trans Pattern Anal Mach Intell
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref57
ref12
ref56
ref15
ref59
ref14
ref58
ref53
ref52
ref11
ref55
ref10
ref54
ref17
ref16
ref18
Kim (ref37) 2019
ref51
Bossen (ref70) 2013; 12
ref50
ref45
ref48
ref42
ref41
ref44
ref43
ref49
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Svoboda (ref19) 2016
ref40
Kingma (ref67) 2014
ref35
ref34
ref36
ref31
ref30
ref33
ref32
ref2
ref1
ref39
ref38
ref24
ref68
ref23
ref26
ref25
ref69
ref20
ref64
ref63
ref22
ref66
ref21
ref65
Chung (ref46) 2017
ref28
ref27
ref29
ref60
ref62
Vougioukas (ref47) 2018
ref61
References_xml – ident: ref11
  doi: 10.1007/978-3-319-10605-2_12
– ident: ref30
  doi: 10.1109/CVPR42600.2020.00853
– ident: ref31
  doi: 10.1109/CVPR42600.2020.00360
– ident: ref26
  doi: 10.1109/ICCV.2019.00714
– ident: ref62
  doi: 10.1109/ICCV.2017.89
– ident: ref39
  doi: 10.1109/ICCV.2017.36
– ident: ref12
  doi: 10.1109/TSP.2013.2290508
– ident: ref24
  doi: 10.1007/s11263-018-01144-2
– ident: ref28
  doi: 10.1109/ICCV.2019.00355
– ident: ref53
  doi: 10.1109/CVPR.2019.00802
– ident: ref44
  doi: 10.1145/311535.311537
– ident: ref4
  doi: 10.1109/CVPR.2019.01126
– ident: ref13
  doi: 10.1109/TIP.2016.2558825
– ident: ref57
  doi: 10.1109/CVPR46437.2021.00416
– ident: ref23
  doi: 10.1109/TIP.2020.3040074
– ident: ref51
  doi: 10.1007/978-3-030-01261-8_41
– ident: ref2
  doi: 10.1109/TCSVT.2003.815165
– ident: ref48
  doi: 10.1145/3197517.3201292
– ident: ref34
  doi: 10.1109/CVPR.2017.180
– ident: ref16
  doi: 10.1109/TIP.2012.2202672
– ident: ref5
  doi: 10.1109/ICCV.2019.00713
– year: 2017
  ident: ref46
  article-title: You said that?,
  publication-title: arXiv:1705.02966
  contributor:
    fullname: Chung
– ident: ref32
  doi: 10.1007/978-3-030-58536-5_12
– ident: ref56
  doi: 10.1145/3414685.3417774
– ident: ref29
  doi: 10.1109/ICCV.2019.00652
– ident: ref58
  doi: 10.1007/978-3-540-30543-9_71
– ident: ref52
  doi: 10.1609/aaai.v33i01.33019299
– ident: ref17
  doi: 10.1109/ICIP.2017.8296933
– year: 2016
  ident: ref19
  article-title: Compression artifacts removal using convolutional neural networks
  contributor:
    fullname: Svoboda
– ident: ref42
  doi: 10.1109/CVPR.2018.00862
– ident: ref43
  doi: 10.1007/978-3-030-01261-8_17
– ident: ref65
  doi: 10.21437/Interspeech.2018-1929
– ident: ref27
  doi: 10.1145/3394171.3413709
– year: 2019
  ident: ref37
  article-title: Progressive face super-resolution via attention to facial landmark
  contributor:
    fullname: Kim
– ident: ref10
  doi: 10.1109/TIP.2013.2274386
– ident: ref20
  doi: 10.1109/CVPR.2017.517
– ident: ref49
  doi: 10.1007/978-3-319-93764-9_35
– ident: ref61
  doi: 10.1109/CVPRW.2019.00247
– ident: ref36
  doi: 10.1109/CVPRW.2017.252
– ident: ref54
  doi: 10.1109/ICCV.2019.00955
– ident: ref21
  doi: 10.1109/ICCV.2017.517
– ident: ref35
  doi: 10.1109/CVPR.2018.00264
– ident: ref66
  doi: 10.21437/Interspeech.2017-950
– ident: ref63
  doi: 10.1109/ICIP.2019.8803398
– year: 2018
  ident: ref47
  article-title: End-to-end speech-driven facial animation with temporal GANs
  contributor:
    fullname: Vougioukas
– ident: ref68
  doi: 10.1007/978-1-4842-2766-4_12
– ident: ref45
  doi: 10.1145/3072959.3073640
– ident: ref8
  doi: 10.1109/CVPR42600.2020.01235
– ident: ref64
  doi: 10.1109/TMM.2019.2962310
– ident: ref69
  doi: 10.1109/ICME.2017.8019299
– ident: ref3
  doi: 10.1109/TCSVT.2012.2221191
– ident: ref22
  doi: 10.1109/DCC.2019.00011
– year: 2014
  ident: ref67
  article-title: Adam: A method for stochastic optimization
  contributor:
    fullname: Kingma
– ident: ref15
  doi: 10.1109/ICIP.2011.6115865
– ident: ref60
  doi: 10.1109/CVPR42600.2020.00342
– ident: ref1
  doi: 10.1109/76.554415
– ident: ref7
  doi: 10.1109/CVPR.2018.00697
– ident: ref9
  doi: 10.1109/TIP.2007.891788
– ident: ref25
  doi: 10.1109/ICIP.2018.8451086
– ident: ref38
  doi: 10.1109/ICCV.2017.187
– ident: ref18
  doi: 10.1109/ICCV.2015.73
– volume: 12
  start-page: 11
  year: 2013
  ident: ref70
  article-title: Common test conditions and software reference configurations
  publication-title: JCTVC-L1100
  contributor:
    fullname: Bossen
– ident: ref41
  doi: 10.1109/CVPR.2018.00101
– ident: ref40
  doi: 10.1007/978-3-030-01240-3_14
– ident: ref6
  doi: 10.1007/978-3-030-01264-9_35
– ident: ref14
  doi: 10.1109/TIP.2016.2526910
– ident: ref33
  doi: 10.1007/978-3-319-46454-1_37
– ident: ref50
  doi: 10.1109/TASLP.2019.2947741
– ident: ref55
  doi: 10.1007/978-3-030-58517-4_42
– ident: ref59
  doi: 10.1016/j.specom.2011.11.004
SSID ssj0014503
Score 2.4427888
Snippet Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms,...
SourceID proquest
crossref
pubmed
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 2024
SubjectTerms Acuity
Artificial neural networks
Cost effectiveness
Deep neural networks
Face recognition
face videos
Faces
Image coding
Image restoration
multi-modality
Salience
Task analysis
Video communication
Video compression
video restoration
Videos
Title Multi-Modality Deep Restoration of Extremely Compressed Face Videos
URI https://ieeexplore.ieee.org/document/9730053
https://www.ncbi.nlm.nih.gov/pubmed/35259095
https://www.proquest.com/docview/2761374694
https://search.proquest.com/docview/2637581660
Volume 45
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dSxwxEB_Up_qgrZ9rbUmhb3XPvU2yuXkU62ELJ1K0-LZkkwmIciu6B-pfb5L9QMRC3xYSkmx-GWYmk5kfwHclMiczoVNVWJuKqjIpmsql6LWTdsipmIRE4dlZcXopfl_JqyU4GHJhiCg-PqNR-IyxfFubRbgqO8RYXJ0vw7LCSZurNUQMhIwsyN6C8RLu3Yg-QSbDw4vzo9kv7wrmufdQpeKTQNIXyoBiFmglXumjSLDyb1sz6pzpOsz61bZPTW5Gi6Yamec3hRz_93c-wlpnfLKj9rR8giWab8B6T-zAOjnfgNVXVQo34Tgm6aaz2kabnf0kumN_IiNNhJXVjp08NuGi8faJhdFiQXLLptoQ-3ttqX7YgsvpycXxadpxL6SGy3GTWhPo-qRWygpnuSQhhKrGhCgsaldZl4fKc3luyXKO3GhymZaWY-ZbKsO3YWVez2kXWMGVmzhEh4URgebDq0PjvFKU2ggz5gn86BEo79oSG2V0TTIsI3RlgK7soEtgM-zk0LPbxAT2e9DKTgofylx5Y0WJAkUC34ZmLz8hKKLnVC98H788GYKnWQI7LdjD2P0Z2Xt_zs_wIZDPt2-492GluV_QF2-iNNXXeDZfAFop34c
link.rule.ids 315,782,786,798,27931,27932,54765
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fb9MwED6N8jB4WKFjEBhgpL1BujS24_px6lZ1sEzT1KG9RY59lhCoqWgqMf56bOeHpgkk3iLZsh1_Pt2dz3cfwJFgieUJU7HIjIlZWepY6tLG0mknZSXFbOoThfPLbHHDPt_y2x341OfCIGJ4fIZj_xli-abSW39VdixDcXX6CB5zJjLeZGv1MQPGAw-ys2GcjDtHokuRSeTx8uokP3fOYJo6H5ULOvU0fb4QqEw8scQ9jRQoVv5tbQatMx9C3q23eWzyfbyty7H-_aCU4__-0DPYa81PctKcl-ewg6sRDDtqB9JK-gie3qtTuA-zkKYb55UJVjs5RVyT68BJE4AllSVnv2p_1fjjjvjRQklyQ-ZKI_n6zWC1eQE387PlbBG37AuxpnxSx0Z7wj6uhDDMGsqRMSbKCUrJjFS2NDb1tefS1KChVFKt0CaKGyoT11JqegCDVbXCV0AyKuzUSmllppkn-nAKUVunFrnSTE9oBB87BIp1U2SjCM5JIosAXeGhK1roItj3O9n3bDcxgsMOtKKVw02RCmeuCJZJFsGHvtlJkA-LqBVWW9fHLY_78GkSwcsG7H7s7oy8_vuc72F3scwviovzyy9v4Imnom9edB_CoP65xbfOYKnLd-Gc_gE_j-LW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-Modality+Deep+Restoration+of+Extremely+Compressed+Face+Videos&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Zhang%2C+Xi&rft.au=Wu%2C+Xiaolin&rft.date=2023-02-01&rft.eissn=1939-3539&rft.volume=45&rft.issue=2&rft.spage=2024&rft.epage=2037&rft_id=info:doi/10.1109%2FTPAMI.2022.3157388&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon