Multi-Modality Deep Restoration of Extremely Compressed Face Videos
Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression ar...
Saved in:
Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 2; pp. 2024 - 2037 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.02.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression artifacts in talking head videos are inevitable. The resulting video quality degradation is highly visible and objectionable due to high acuity of human visual system to faces. To solve this problem, we develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed. The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities: the video-synchronized speech signal and semantic elements of the compression code stream, including motion vectors, code partition map and quantization parameters. These priors strongly correlate with the latent video and hence they are able to enhance the capability of deep learning to remove compression artifacts. Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos over the existing state-of-the-art methods. |
---|---|
AbstractList | Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms, teleconferences, news broadcasting, talk shows, etc. When communication bandwidth is limited by network congestions or cost effectiveness, compression artifacts in talking head videos are inevitable. The resulting video quality degradation is highly visible and objectionable due to high acuity of human visual system to faces. To solve this problem, we develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed. The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities: the video-synchronized speech signal and semantic elements of the compression code stream, including motion vectors, code partition map and quantization parameters. These priors strongly correlate with the latent video and hence they are able to enhance the capability of deep learning to remove compression artifacts. Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos over the existing state-of-the-art methods. |
Author | Wu, Xiaolin Zhang, Xi |
Author_xml | – sequence: 1 givenname: Xi orcidid: 0000-0002-1993-6031 surname: Zhang fullname: Zhang, Xi email: zhangxi_19930818@sjtu.edu.cn organization: Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Xiaolin orcidid: 0000-0002-0103-5374 surname: Wu fullname: Wu, Xiaolin email: xwu@ece.mcmaster.ca organization: Department of Electrical & Computer Engineering, McMaster University, Hamilton, Ontario, Canada |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35259095$$D View this record in MEDLINE/PubMed |
BookMark | eNpdkMFKxDAURYMoOo7-gIIU3LjpmOQ1TbOU6qgwgyKj25A2r1BpmzFpwfl7qzO6cPUW79zL5RyT_c51SMgZozPGqLpePd8sH2eccj4DJiRk2R6ZMAUqBgFqn0woS3mcZTw7IschvFPKEkHhkByB4EJRJSYkXw5NX8dLZ01T95voFnEdvWDonTd97brIVdHdZ--xxWYT5a5dewwBbTQ3JUZvtUUXTshBZZqAp7s7Ja_zu1X-EC-e7h_zm0VcgmB9bMtxQCaMlDapLAhMkkQWDJVKrDJVYSsuZZZybtECKCgNVtQIC4qOn6KEKbna9q69-xjGjbqtQ4lNYzp0Q9A8BSkylqZ0RC__oe9u8N24TnOZMpBJqpKR4luq9C4Ej5Ve-7o1fqMZ1d-K9Y9i_a1Y7xSPoYtd9VC0aP8iv05H4HwL1Ij491YSKBUAX1dLgLs |
CODEN | ITPIDJ |
CitedBy_id | crossref_primary_10_1587_transfun_2022EAL2039 crossref_primary_10_1109_TMM_2023_3264882 |
Cites_doi | 10.1007/978-3-319-10605-2_12 10.1109/CVPR42600.2020.00853 10.1109/CVPR42600.2020.00360 10.1109/ICCV.2019.00714 10.1109/ICCV.2017.89 10.1109/ICCV.2017.36 10.1109/TSP.2013.2290508 10.1007/s11263-018-01144-2 10.1109/ICCV.2019.00355 10.1109/CVPR.2019.00802 10.1145/311535.311537 10.1109/CVPR.2019.01126 10.1109/TIP.2016.2558825 10.1109/CVPR46437.2021.00416 10.1109/TIP.2020.3040074 10.1007/978-3-030-01261-8_41 10.1109/TCSVT.2003.815165 10.1145/3197517.3201292 10.1109/CVPR.2017.180 10.1109/TIP.2012.2202672 10.1109/ICCV.2019.00713 10.1007/978-3-030-58536-5_12 10.1145/3414685.3417774 10.1109/ICCV.2019.00652 10.1007/978-3-540-30543-9_71 10.1609/aaai.v33i01.33019299 10.1109/ICIP.2017.8296933 10.1109/CVPR.2018.00862 10.1007/978-3-030-01261-8_17 10.21437/Interspeech.2018-1929 10.1145/3394171.3413709 10.1109/TIP.2013.2274386 10.1109/CVPR.2017.517 10.1007/978-3-319-93764-9_35 10.1109/CVPRW.2019.00247 10.1109/CVPRW.2017.252 10.1109/ICCV.2019.00955 10.1109/ICCV.2017.517 10.1109/CVPR.2018.00264 10.21437/Interspeech.2017-950 10.1109/ICIP.2019.8803398 10.1007/978-1-4842-2766-4_12 10.1145/3072959.3073640 10.1109/CVPR42600.2020.01235 10.1109/TMM.2019.2962310 10.1109/ICME.2017.8019299 10.1109/TCSVT.2012.2221191 10.1109/DCC.2019.00011 10.1109/ICIP.2011.6115865 10.1109/CVPR42600.2020.00342 10.1109/76.554415 10.1109/CVPR.2018.00697 10.1109/TIP.2007.891788 10.1109/ICIP.2018.8451086 10.1109/ICCV.2017.187 10.1109/ICCV.2015.73 10.1109/CVPR.2018.00101 10.1007/978-3-030-01240-3_14 10.1007/978-3-030-01264-9_35 10.1109/TIP.2016.2526910 10.1007/978-3-319-46454-1_37 10.1109/TASLP.2019.2947741 10.1007/978-3-030-58517-4_42 10.1016/j.specom.2011.11.004 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
DBID | 97E RIA RIE NPM AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TPAMI.2022.3157388 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library PubMed CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | PubMed CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | Technology Research Database PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1939-3539 2160-9292 |
EndPage | 2037 |
ExternalDocumentID | 10_1109_TPAMI_2022_3157388 35259095 9730053 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 – fundername: Natural Sciences and Engineering Research Council of Canada funderid: 10.13039/501100000038 |
GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNS RXW TAE TN5 UHB ~02 5VS 9M8 AAYOK ABFSI ADRHT AETIX AI. AIBXA ALLEH F20 FA8 H~9 IBMZZ ICLAB IFJZH NPM RNI RZB VH1 XJT AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c351t-dc00185a77d4fd35e4447b1e994d9afbdf2778622ded3393caef0a5d390f27bc3 |
IEDL.DBID | RIE |
ISSN | 0162-8828 |
IngestDate | Thu Jul 25 07:40:59 EDT 2024 Thu Oct 10 18:45:27 EDT 2024 Fri Aug 23 01:56:07 EDT 2024 Sat Sep 28 08:21:00 EDT 2024 Wed Jun 26 19:25:48 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c351t-dc00185a77d4fd35e4447b1e994d9afbdf2778622ded3393caef0a5d390f27bc3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-1993-6031 0000-0002-0103-5374 |
PMID | 35259095 |
PQID | 2761374694 |
PQPubID | 85458 |
PageCount | 14 |
ParticipantIDs | crossref_primary_10_1109_TPAMI_2022_3157388 proquest_journals_2761374694 proquest_miscellaneous_2637581660 pubmed_primary_35259095 ieee_primary_9730053 |
PublicationCentury | 2000 |
PublicationDate | 2023-02-01 |
PublicationDateYYYYMMDD | 2023-02-01 |
PublicationDate_xml | – month: 02 year: 2023 text: 2023-02-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: New York |
PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
PublicationTitleAbbrev | TPAMI |
PublicationTitleAlternate | IEEE Trans Pattern Anal Mach Intell |
PublicationYear | 2023 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref57 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref52 ref11 ref55 ref10 ref54 ref17 ref16 ref18 Kim (ref37) 2019 ref51 Bossen (ref70) 2013; 12 ref50 ref45 ref48 ref42 ref41 ref44 ref43 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Svoboda (ref19) 2016 ref40 Kingma (ref67) 2014 ref35 ref34 ref36 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 ref24 ref68 ref23 ref26 ref25 ref69 ref20 ref64 ref63 ref22 ref66 ref21 ref65 Chung (ref46) 2017 ref28 ref27 ref29 ref60 ref62 Vougioukas (ref47) 2018 ref61 |
References_xml | – ident: ref11 doi: 10.1007/978-3-319-10605-2_12 – ident: ref30 doi: 10.1109/CVPR42600.2020.00853 – ident: ref31 doi: 10.1109/CVPR42600.2020.00360 – ident: ref26 doi: 10.1109/ICCV.2019.00714 – ident: ref62 doi: 10.1109/ICCV.2017.89 – ident: ref39 doi: 10.1109/ICCV.2017.36 – ident: ref12 doi: 10.1109/TSP.2013.2290508 – ident: ref24 doi: 10.1007/s11263-018-01144-2 – ident: ref28 doi: 10.1109/ICCV.2019.00355 – ident: ref53 doi: 10.1109/CVPR.2019.00802 – ident: ref44 doi: 10.1145/311535.311537 – ident: ref4 doi: 10.1109/CVPR.2019.01126 – ident: ref13 doi: 10.1109/TIP.2016.2558825 – ident: ref57 doi: 10.1109/CVPR46437.2021.00416 – ident: ref23 doi: 10.1109/TIP.2020.3040074 – ident: ref51 doi: 10.1007/978-3-030-01261-8_41 – ident: ref2 doi: 10.1109/TCSVT.2003.815165 – ident: ref48 doi: 10.1145/3197517.3201292 – ident: ref34 doi: 10.1109/CVPR.2017.180 – ident: ref16 doi: 10.1109/TIP.2012.2202672 – ident: ref5 doi: 10.1109/ICCV.2019.00713 – year: 2017 ident: ref46 article-title: You said that?, publication-title: arXiv:1705.02966 contributor: fullname: Chung – ident: ref32 doi: 10.1007/978-3-030-58536-5_12 – ident: ref56 doi: 10.1145/3414685.3417774 – ident: ref29 doi: 10.1109/ICCV.2019.00652 – ident: ref58 doi: 10.1007/978-3-540-30543-9_71 – ident: ref52 doi: 10.1609/aaai.v33i01.33019299 – ident: ref17 doi: 10.1109/ICIP.2017.8296933 – year: 2016 ident: ref19 article-title: Compression artifacts removal using convolutional neural networks contributor: fullname: Svoboda – ident: ref42 doi: 10.1109/CVPR.2018.00862 – ident: ref43 doi: 10.1007/978-3-030-01261-8_17 – ident: ref65 doi: 10.21437/Interspeech.2018-1929 – ident: ref27 doi: 10.1145/3394171.3413709 – year: 2019 ident: ref37 article-title: Progressive face super-resolution via attention to facial landmark contributor: fullname: Kim – ident: ref10 doi: 10.1109/TIP.2013.2274386 – ident: ref20 doi: 10.1109/CVPR.2017.517 – ident: ref49 doi: 10.1007/978-3-319-93764-9_35 – ident: ref61 doi: 10.1109/CVPRW.2019.00247 – ident: ref36 doi: 10.1109/CVPRW.2017.252 – ident: ref54 doi: 10.1109/ICCV.2019.00955 – ident: ref21 doi: 10.1109/ICCV.2017.517 – ident: ref35 doi: 10.1109/CVPR.2018.00264 – ident: ref66 doi: 10.21437/Interspeech.2017-950 – ident: ref63 doi: 10.1109/ICIP.2019.8803398 – year: 2018 ident: ref47 article-title: End-to-end speech-driven facial animation with temporal GANs contributor: fullname: Vougioukas – ident: ref68 doi: 10.1007/978-1-4842-2766-4_12 – ident: ref45 doi: 10.1145/3072959.3073640 – ident: ref8 doi: 10.1109/CVPR42600.2020.01235 – ident: ref64 doi: 10.1109/TMM.2019.2962310 – ident: ref69 doi: 10.1109/ICME.2017.8019299 – ident: ref3 doi: 10.1109/TCSVT.2012.2221191 – ident: ref22 doi: 10.1109/DCC.2019.00011 – year: 2014 ident: ref67 article-title: Adam: A method for stochastic optimization contributor: fullname: Kingma – ident: ref15 doi: 10.1109/ICIP.2011.6115865 – ident: ref60 doi: 10.1109/CVPR42600.2020.00342 – ident: ref1 doi: 10.1109/76.554415 – ident: ref7 doi: 10.1109/CVPR.2018.00697 – ident: ref9 doi: 10.1109/TIP.2007.891788 – ident: ref25 doi: 10.1109/ICIP.2018.8451086 – ident: ref38 doi: 10.1109/ICCV.2017.187 – ident: ref18 doi: 10.1109/ICCV.2015.73 – volume: 12 start-page: 11 year: 2013 ident: ref70 article-title: Common test conditions and software reference configurations publication-title: JCTVC-L1100 contributor: fullname: Bossen – ident: ref41 doi: 10.1109/CVPR.2018.00101 – ident: ref40 doi: 10.1007/978-3-030-01240-3_14 – ident: ref6 doi: 10.1007/978-3-030-01264-9_35 – ident: ref14 doi: 10.1109/TIP.2016.2526910 – ident: ref33 doi: 10.1007/978-3-319-46454-1_37 – ident: ref50 doi: 10.1109/TASLP.2019.2947741 – ident: ref55 doi: 10.1007/978-3-030-58517-4_42 – ident: ref59 doi: 10.1016/j.specom.2011.11.004 |
SSID | ssj0014503 |
Score | 2.4427888 |
Snippet | Arguably the most common and salient object in daily video communications is the talking head, as encountered in social media, virtual classrooms,... |
SourceID | proquest crossref pubmed ieee |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 2024 |
SubjectTerms | Acuity Artificial neural networks Cost effectiveness Deep neural networks Face recognition face videos Faces Image coding Image restoration multi-modality Salience Task analysis Video communication Video compression video restoration Videos |
Title | Multi-Modality Deep Restoration of Extremely Compressed Face Videos |
URI | https://ieeexplore.ieee.org/document/9730053 https://www.ncbi.nlm.nih.gov/pubmed/35259095 https://www.proquest.com/docview/2761374694 https://search.proquest.com/docview/2637581660 |
Volume | 45 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dSxwxEB_Up_qgrZ9rbUmhb3XPvU2yuXkU62ELJ1K0-LZkkwmIciu6B-pfb5L9QMRC3xYSkmx-GWYmk5kfwHclMiczoVNVWJuKqjIpmsql6LWTdsipmIRE4dlZcXopfl_JqyU4GHJhiCg-PqNR-IyxfFubRbgqO8RYXJ0vw7LCSZurNUQMhIwsyN6C8RLu3Yg-QSbDw4vzo9kv7wrmufdQpeKTQNIXyoBiFmglXumjSLDyb1sz6pzpOsz61bZPTW5Gi6Yamec3hRz_93c-wlpnfLKj9rR8giWab8B6T-zAOjnfgNVXVQo34Tgm6aaz2kabnf0kumN_IiNNhJXVjp08NuGi8faJhdFiQXLLptoQ-3ttqX7YgsvpycXxadpxL6SGy3GTWhPo-qRWygpnuSQhhKrGhCgsaldZl4fKc3luyXKO3GhymZaWY-ZbKsO3YWVez2kXWMGVmzhEh4URgebDq0PjvFKU2ggz5gn86BEo79oSG2V0TTIsI3RlgK7soEtgM-zk0LPbxAT2e9DKTgofylx5Y0WJAkUC34ZmLz8hKKLnVC98H788GYKnWQI7LdjD2P0Z2Xt_zs_wIZDPt2-492GluV_QF2-iNNXXeDZfAFop34c |
link.rule.ids | 315,782,786,798,27931,27932,54765 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fb9MwED6N8jB4WKFjEBhgpL1BujS24_px6lZ1sEzT1KG9RY59lhCoqWgqMf56bOeHpgkk3iLZsh1_Pt2dz3cfwJFgieUJU7HIjIlZWepY6tLG0mknZSXFbOoThfPLbHHDPt_y2x341OfCIGJ4fIZj_xli-abSW39VdixDcXX6CB5zJjLeZGv1MQPGAw-ys2GcjDtHokuRSeTx8uokP3fOYJo6H5ULOvU0fb4QqEw8scQ9jRQoVv5tbQatMx9C3q23eWzyfbyty7H-_aCU4__-0DPYa81PctKcl-ewg6sRDDtqB9JK-gie3qtTuA-zkKYb55UJVjs5RVyT68BJE4AllSVnv2p_1fjjjvjRQklyQ-ZKI_n6zWC1eQE387PlbBG37AuxpnxSx0Z7wj6uhDDMGsqRMSbKCUrJjFS2NDb1tefS1KChVFKt0CaKGyoT11JqegCDVbXCV0AyKuzUSmllppkn-nAKUVunFrnSTE9oBB87BIp1U2SjCM5JIosAXeGhK1roItj3O9n3bDcxgsMOtKKVw02RCmeuCJZJFsGHvtlJkA-LqBVWW9fHLY_78GkSwcsG7H7s7oy8_vuc72F3scwviovzyy9v4Imnom9edB_CoP65xbfOYKnLd-Gc_gE_j-LW |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-Modality+Deep+Restoration+of+Extremely+Compressed+Face+Videos&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Zhang%2C+Xi&rft.au=Wu%2C+Xiaolin&rft.date=2023-02-01&rft.eissn=1939-3539&rft.volume=45&rft.issue=2&rft.spage=2024&rft.epage=2037&rft_id=info:doi/10.1109%2FTPAMI.2022.3157388&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |