Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimiz...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 7; pp. 5174 - 5191
Main Authors	Yang, Wenhan, Huang, Haofeng, Hu, Yueyu, Duan, Ling-Yu, Liu, Jiaying
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	analytics taxonomy codebook-hyperprior Codec Coding compact visual representation Data compression Encoding Feature extraction Image coding Image compression Labels Machine vision multiple tasks Neural networks Optimization Representations Task analysis Taxonomy Video coding Video coding for machines Video compression Vision systems Visual tasks
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2024.3367293

Cover

Loading…

Abstract	As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
AbstractList	As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks. As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks. As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
Author	Duan, Ling-Yu Huang, Haofeng Liu, Jiaying Hu, Yueyu Yang, Wenhan
Author_xml	– sequence: 1 givenname: Wenhan orcidid: 0000-0002-1692-0069 surname: Yang fullname: Yang, Wenhan email: yangwenhan@pku.edu.cn organization: Peking University, Beijing, China – sequence: 2 givenname: Haofeng orcidid: 0000-0002-1480-7388 surname: Huang fullname: Huang, Haofeng email: huang6013@pku.edu.cn organization: Peking University, Beijing, China – sequence: 3 givenname: Yueyu orcidid: 0000-0003-4919-4515 surname: Hu fullname: Hu, Yueyu email: huyy@pku.edu.cn organization: Peking University, Beijing, China – sequence: 4 givenname: Ling-Yu orcidid: 0000-0002-4491-2023 surname: Duan fullname: Duan, Ling-Yu email: lingyu@pcl.ac.cn organization: National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, Beijing, China – sequence: 5 givenname: Jiaying orcidid: 0000-0002-0468-9576 surname: Liu fullname: Liu, Jiaying email: liujiaying@pku.edu.cn organization: Peking University, Beijing, China
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/38376966$$D View this record in MEDLINE/PubMed
BookMark	eNp9kU9vEzEQxS3UiqaFL4AQWolLLxvG_73coohCpFZFqPRqeb1OceXYwd5F6rfHaQKqeuhpbM_vjUfvnaKjmKJD6B2GOcbQfbr5vrhazQkQNqdUSNLRV2hGsIC2Ix05QjPAgrRKEXWCTku5B8CMA32NTqiiUnRCzFC49YNLzTINPt4165SbK2N_-ejK5_q42Ro7Nre-TCY0P9w2u-LiaEaf4mO33svuvNOt4uhC8HcVqL0QTJ9yJf-4ZhFNeBi9LW_Q8dqE4t4e6hn6efHlZvmtvbz-ulouLltLORlbNyjeOd5LTKXBhjJGiKwVE6l4bzEo6LDEagBpqLWKWmXBSCFhINArQs_Q-X7uNqffkyuj3vhi63YmujQVXd3pOAMJvKIfn6H3acp14aIpCCZBAGeV-nCgpn7jBr3NfmPyg_7nYwXIHrA5lZLd-j-CQe_C0o9h6V1Y-hBWFalnIuv35o7Z-PCy9P1e6p1zT_5iDDgh9C8TlKDx
CODEN	ITPIDJ
CitedBy_id	crossref_primary_10_1186_s13640_024_00647_y crossref_primary_10_1109_JETCAS_2024_3524260 crossref_primary_10_1109_JSAC_2024_3460078 crossref_primary_10_1109_ACCESS_2025_3549316 crossref_primary_10_1038_s41598_025_85602_1 crossref_primary_10_1109_TCSVT_2024_3467124
Cites_doi	10.1145/3394171.3413968 10.1109/CVPR46437.2021.00991 10.1109/ICIP40778.2020.9191184 10.1109/TCSVT.2021.3104305 10.1109/ICCV.2017.244 10.1109/ICME51207.2021.9428417 10.1109/CVPR.2016.90 10.1109/DCC50243.2021.00024 10.1145/3343031.3350874 10.5555/2969033.2969125 10.1109/TIP.2019.2941660 10.1109/ICIP40778.2020.9191247 10.1109/ICIP40778.2020.9190860 10.1109/ICME46284.2020.9102843 10.1109/cvprw53098.2021.00271 10.1109/MSP.2014.2371951 10.1109/TIP.2022.3160602 10.1109/ICASSP40776.2020.9053011 10.1109/ICME46284.2020.9102750 10.1109/TMM.2021.3094300 10.1109/ICIP.2019.8803255 10.1109/DCC50243.2021.00057 10.1109/TMM.2020.2966885 10.1109/TMM.2021.3068580 10.1109/DCC47342.2020.00044 10.1109/ICASSP.2019.8682641 10.1109/CVPRW50498.2020.00088 10.1109/TIP.2021.3060875 10.1109/ICME51207.2021.9428224 10.1109/ICASSP40776.2020.9054770 10.1109/TIP.2020.3016485 10.1109/TCSVT.2003.815165 10.1109/ICME51207.2021.9428366 10.1109/ICASSP39728.2021.9413603 10.1109/CVPR42600.2020.01013 10.1007/978-3-319-10602-1_48 10.1109/TCSVT.2012.2221191 10.1109/ICASSP39728.2021.9413943 10.1007/978-3-030-58565-5_19 10.1109/VCIP49819.2020.9301807 10.1145/3343031.3350849 10.1145/1274871.1274888 10.1109/ICASSP40776.2020.9054527 10.1109/ICME51207.2021.9428228 10.1109/CVPR46437.2021.01641 10.1109/ICIP.2019.8803275 10.1109/ICIP.2019.8803110 10.1109/ICIP40778.2020.9190843 10.1002/047174882x 10.1109/ICME51207.2021.9428258 10.1109/ICASSP.2019.8683541 10.1109/ICIP40778.2020.9190933 10.1109/TPAMI.2021.3054719 10.1109/CVPR42600.2020.00813 10.1109/CVPR.2016.91 10.1109/JIOT.2020.3039359 10.1109/ICASSP39728.2021.9414465 10.1109/ICME46284.2020.9102810 10.1109/CVPR42600.2020.00271 10.1109/ICIP.2019.8803805 10.1109/ICCV.2015.169 10.1109/CVPR42600.2020.00796 10.1007/s11263-021-01491-7 10.1109/ICASSP39728.2021.9413506 10.1109/CVPR.2018.00391 10.1109/ICASSP40776.2020.9054165
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
DOI	10.1109/TPAMI.2024.3367293
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic Technology Research Database PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	2160-9292 1939-3539
EndPage	5191
ExternalDocumentID	38376966 10_1109_TPAMI_2024_3367293 10440522
Genre	orig-research Journal Article
GrantInformation_xml	– fundername: Fuzhou Chengtou New Infrastructure Group – fundername: Boyun Vision Company Ltd. – fundername: National Natural Science Foundation of China grantid: 62332010; 62088102 funderid: 10.13039/501100001809 – fundername: PKU-NTU Joint Research Institute – fundername: AI Joint Lab of Future Urban Infrastructure – fundername: Ng Teng Fong Charitable Foundation funderid: 10.13039/501100018807
GroupedDBID	--- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E 9M8 AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT ADRHT AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P FA8 HZ~ H~9 IBMZZ ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TN5 UHB VH1 XJT ~02 AAYOK AAYXX CITATION RIG NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c352t-ed859e5b7137a1a3442271a312785bc108091718d07a3cc83c8c0a7670d20b823
IEDL.DBID	RIE
ISSN	0162-8828 1939-3539
IngestDate	Fri Jul 11 10:36:03 EDT 2025 Sun Jun 29 12:14:58 EDT 2025 Thu Apr 03 07:00:53 EDT 2025 Tue Jul 01 01:43:09 EDT 2025 Thu Apr 24 22:51:59 EDT 2025 Wed Aug 27 02:06:04 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	7
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c352t-ed859e5b7137a1a3442271a312785bc108091718d07a3cc83c8c0a7670d20b823
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0003-4919-4515 0000-0002-1480-7388 0000-0002-4491-2023 0000-0002-0468-9576 0000-0002-1692-0069
PMID	38376966
PQID	3064706054
PQPubID	85458
PageCount	18
ParticipantIDs	ieee_primary_10440522 crossref_primary_10_1109_TPAMI_2024_3367293 pubmed_primary_38376966 proquest_miscellaneous_2929540705 proquest_journals_3064706054 crossref_citationtrail_10_1109_TPAMI_2024_3367293
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-07-01
PublicationDateYYYYMMDD	2024-07-01
PublicationDate_xml	– month: 07 year: 2024 text: 2024-07-01 day: 01
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev	TPAMI
PublicationTitleAlternate	IEEE Trans Pattern Anal Mach Intell
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref57 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref11 Simonyan (ref52) ref55 ref10 ref54 ref17 ref19 ref18 Bellard (ref7) 2021 ref51 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref44 Ballé (ref6) ref49 ref8 ref9 ref4 ref3 ref5 Krizhevsky (ref36) ref35 ref34 ref37 ref31 ref75 ref30 ref74 ref33 ref32 ref76 ref2 ref1 ref39 ref38 Chun (ref16) Locatello (ref43) ref71 ref70 ref73 ref72 ref24 ref68 ref23 ref67 Lin (ref40) ref25 ref69 ref20 ref64 ref63 ref66 ref21 ref65 ref28 ref27 ref29 (ref26) 2021 Gao (ref22) 2021 ref60 ref62 ref61
References_xml	– ident: ref71 doi: 10.1145/3394171.3413968 – ident: ref63 doi: 10.1109/CVPR46437.2021.00991 – ident: ref69 doi: 10.1109/ICIP40778.2020.9191184 – ident: ref23 doi: 10.1109/TCSVT.2021.3104305 – ident: ref76 doi: 10.1109/ICCV.2017.244 – ident: ref32 doi: 10.1109/ICME51207.2021.9428417 – start-page: 10129 volume-title: Proc. AAAI Conf. Artif. Intell. ident: ref40 article-title: Enhancing unsupervised video representation learning by decoupling the scene and the motion – ident: ref27 doi: 10.1109/CVPR.2016.90 – ident: ref8 doi: 10.1109/DCC50243.2021.00024 – ident: ref39 doi: 10.1145/3343031.3350874 – ident: ref25 doi: 10.5555/2969033.2969125 – ident: ref14 doi: 10.1109/TIP.2019.2941660 – ident: ref48 doi: 10.1109/ICIP40778.2020.9191247 – ident: ref54 doi: 10.1109/ICIP40778.2020.9190860 – ident: ref66 doi: 10.1109/ICME46284.2020.9102843 – ident: ref46 doi: 10.1109/cvprw53098.2021.00271 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations ident: ref6 article-title: Variational image compression with a scale hyperprior – ident: ref45 doi: 10.1109/MSP.2014.2371951 – year: 2021 ident: ref26 article-title: Draft of white paper on motivation and requirements for video coding for machine – ident: ref17 doi: 10.1109/TIP.2022.3160602 – ident: ref18 doi: 10.1109/ICASSP40776.2020.9053011 – ident: ref30 doi: 10.1109/ICME46284.2020.9102750 – ident: ref62 doi: 10.1109/TMM.2021.3094300 – ident: ref60 doi: 10.1109/ICIP.2019.8803255 – ident: ref49 doi: 10.1109/DCC50243.2021.00057 – start-page: 1106 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref36 article-title: ImageNet classification with deep convolutional neural networks – ident: ref44 doi: 10.1109/TMM.2020.2966885 – ident: ref70 doi: 10.1109/TMM.2021.3068580 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations ident: ref52 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref68 doi: 10.1109/DCC47342.2020.00044 – ident: ref11 doi: 10.1109/ICASSP.2019.8682641 – ident: ref28 doi: 10.1109/CVPRW50498.2020.00088 – ident: ref53 doi: 10.1109/ICIP40778.2020.9190860 – ident: ref4 doi: 10.1109/TIP.2021.3060875 – ident: ref38 doi: 10.1109/ICME51207.2021.9428224 – start-page: 4114 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref43 article-title: Challenging common assumptions in the unsupervised learning of disentangled representations – ident: ref3 doi: 10.1109/ICASSP40776.2020.9054770 – ident: ref21 doi: 10.1109/TIP.2020.3016485 – ident: ref64 doi: 10.1109/TCSVT.2003.815165 – ident: ref10 doi: 10.1109/ICME51207.2021.9428366 – ident: ref58 doi: 10.1109/ICASSP39728.2021.9413603 – ident: ref29 doi: 10.1109/CVPR42600.2020.01013 – ident: ref41 doi: 10.1007/978-3-319-10602-1_48 – ident: ref55 doi: 10.1109/TCSVT.2012.2221191 – ident: ref5 doi: 10.1109/ICASSP39728.2021.9413943 – ident: ref19 doi: 10.1007/978-3-030-58565-5_19 – ident: ref31 doi: 10.1109/VCIP49819.2020.9301807 – ident: ref13 doi: 10.1145/3343031.3350849 – ident: ref47 doi: 10.1145/1274871.1274888 – ident: ref51 doi: 10.1109/ICASSP40776.2020.9054527 – ident: ref33 doi: 10.1109/ICME51207.2021.9428228 – ident: ref67 doi: 10.1109/CVPR46437.2021.01641 – ident: ref56 doi: 10.1109/ICIP.2019.8803275 – ident: ref2 doi: 10.1109/ICIP.2019.8803110 – ident: ref12 doi: 10.1109/ICIP40778.2020.9190843 – ident: ref20 doi: 10.1002/047174882x – ident: ref75 doi: 10.1109/ICME51207.2021.9428258 – ident: ref1 doi: 10.1109/ICASSP.2019.8683541 – ident: ref57 doi: 10.1109/ICIP40778.2020.9190933 – ident: ref59 doi: 10.1109/TPAMI.2021.3054719 – start-page: 7936 volume-title: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. Workshops ident: ref16 article-title: Learned prior information for image compression – ident: ref34 doi: 10.1109/CVPR42600.2020.00813 – ident: ref50 doi: 10.1109/CVPR.2016.91 – ident: ref74 doi: 10.1109/JIOT.2020.3039359 – year: 2021 ident: ref7 article-title: BPG image format – ident: ref37 doi: 10.1109/ICASSP39728.2021.9414465 – ident: ref65 doi: 10.1109/ICME46284.2020.9102810 – ident: ref72 doi: 10.1109/CVPR42600.2020.00271 – ident: ref9 doi: 10.1109/ICIP.2019.8803805 – ident: ref24 doi: 10.1109/ICCV.2015.169 – ident: ref15 doi: 10.1109/CVPR42600.2020.00796 – ident: ref42 doi: 10.1007/s11263-021-01491-7 – ident: ref61 doi: 10.1109/ICASSP39728.2021.9413506 – year: 2021 ident: ref22 article-title: Recent Standard Development Activities on Video Coding for Machines. arXiv e-prints – ident: ref73 doi: 10.1109/CVPR.2018.00391 – ident: ref35 doi: 10.1109/ICASSP40776.2020.9054165
SSID	ssj0014503
Score	2.5258112
Snippet	As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM... As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	5174
SubjectTerms	analytics taxonomy codebook-hyperprior Codec Coding compact visual representation Data compression Encoding Feature extraction Image coding Image compression Labels Machine vision multiple tasks Neural networks Optimization Representations Task analysis Taxonomy Video coding Video coding for machines Video compression Vision systems Visual tasks
Title	Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics
URI	https://ieeexplore.ieee.org/document/10440522 https://www.ncbi.nlm.nih.gov/pubmed/38376966 https://www.proquest.com/docview/3064706054 https://www.proquest.com/docview/2929540705
Volume	46
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS-1ADA7qShe-H70-GMGd9NjTmZ7OuJODosIRERV3pZ3OAVFa8bR3cX_9TdKHIiiuWminnZJkkq-TfAE40pkxU2qTisGB9pVFWRj0A76dojvQqGAhc3dObkaXD-r6KXpqi9W5FsY5x8lnbkCnvJefl7amX2Vo4UrR6HmYR-TWFGv1WwYq4jbIGMKgiSOO6CpkAnNyf3s2uUIsGKqBlCMMJ6l7DkGzkWF2xA-HxB1Wvg822elcrMBNN90m1-RlUFfZwP77wuT46-9ZheU2_BRnjb6swZwr1mGla-0gWktfh6VPPIUb8Pr4nLtSjEtydALDXDHhHEw3OxW8oNhKPD7PanzyHWfWtgVNBV9tMm0LHnfVM4BWYvyhgn-dYHYU4ozehIeL8_vxpd-2afAtRm-V73IdGRdlCHfjdJhKpcIwxuMwjHWUWUpiREw41HkQp9JaLa22QRqP4iAPg0yHcgsWirJwOyCkjJ1xakoLi8LYJU2zFDGWlXmQ20BNPRh2skpsy2FOrTReE8YygUlY1AmJOmlF7cFxP-atYfD48e5NktOnOxsRebDX6UTSWvksIfRG7EOR8uCwv4z2SZsuaeHKepaEhnZScWGNPNhudKl_eKeCf7556S4s0tya7OA9WKjea7ePMVCVHbDu_wcf8v09
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fb9QwDLZgPAAPDMaAwoAg8YZ6a5v0mvA2nZjuYHdC6DbtLWrTnDQxtdOu5YG_Htv9sQlpiKdWatOmsh1_buzPAB91YcyG2qQiONChcigLg34gdBt0BxoVLGHuzuVqOj9VX8_T875YnWthvPecfOYndMp7-WXtWvpVhhauFI2-Dw_Q8adxV641bhqolBshI4hBI8dIYqiRiczh-vvRcoHRYKImUk4RUFL_HArOpob5EW9cEvdYuRtusts53oXVMOEu2-TnpG2Kifv9F5fjf3_RU3jSA1Bx1GnMM7jnqz3YHZo7iN7W9-DxLabC53B5dlH6WsxqcnUCga5Ycham334WvKS4RpxdbFt88g_Ore1Lmiq-2uXaVjxuMXKANmJ2o4S_vGB-FGKN3ofT4y_r2TzsGzWEDvFbE_pSp8anBQa8WR7nUqkkyfAYJ5lOC0dpjBgVxrqMslw6p6XTLsqzaRaVSVToRL6Anaqu_CsQUmbeeLWhpUUhesnzIscoy8kyKl2kNgHEg6ys61nMqZnGpeVoJjKWRW1J1LYXdQCfxjFXHYfHP-_eJzndurMTUQAHg07Y3s63luI34h9KVQAfxstoobTtkle-brc2MbSXiktrGsDLTpfGhw8q-PqOl76Hh_P18sSeLFbf3sAjmmeXK3wAO811698iImqKd2wHfwB6JwCV
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Video+Coding+for+Machines%3A+Compact+Visual+Representation+Compression+for+Intelligent+Collaborative+Analytics&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Yang%2C+Wenhan&rft.au=Huang%2C+Haofeng&rft.au=Hu%2C+Yueyu&rft.au=Ling-Yu%2C+Duan&rft.date=2024-07-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0162-8828&rft.eissn=1939-3539&rft.volume=46&rft.issue=7&rft.spage=5174&rft_id=info:doi/10.1109%2FTPAMI.2024.3367293&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon