Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimiz...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 7; pp. 5174 - 5191
Main Authors Yang, Wenhan, Huang, Haofeng, Hu, Yueyu, Duan, Ling-Yu, Liu, Jiaying
Format Journal Article
LanguageEnglish
Published United States IEEE 01.07.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0162-8828
1939-3539
2160-9292
1939-3539
DOI10.1109/TPAMI.2024.3367293

Cover

Loading…
Abstract As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
AbstractList As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM ) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.
Author Duan, Ling-Yu
Huang, Haofeng
Liu, Jiaying
Hu, Yueyu
Yang, Wenhan
Author_xml – sequence: 1
  givenname: Wenhan
  orcidid: 0000-0002-1692-0069
  surname: Yang
  fullname: Yang, Wenhan
  email: yangwenhan@pku.edu.cn
  organization: Peking University, Beijing, China
– sequence: 2
  givenname: Haofeng
  orcidid: 0000-0002-1480-7388
  surname: Huang
  fullname: Huang, Haofeng
  email: huang6013@pku.edu.cn
  organization: Peking University, Beijing, China
– sequence: 3
  givenname: Yueyu
  orcidid: 0000-0003-4919-4515
  surname: Hu
  fullname: Hu, Yueyu
  email: huyy@pku.edu.cn
  organization: Peking University, Beijing, China
– sequence: 4
  givenname: Ling-Yu
  orcidid: 0000-0002-4491-2023
  surname: Duan
  fullname: Duan, Ling-Yu
  email: lingyu@pcl.ac.cn
  organization: National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, Beijing, China
– sequence: 5
  givenname: Jiaying
  orcidid: 0000-0002-0468-9576
  surname: Liu
  fullname: Liu, Jiaying
  email: liujiaying@pku.edu.cn
  organization: Peking University, Beijing, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38376966$$D View this record in MEDLINE/PubMed
BookMark eNp9kU9vEzEQxS3UiqaFL4AQWolLLxvG_73coohCpFZFqPRqeb1OceXYwd5F6rfHaQKqeuhpbM_vjUfvnaKjmKJD6B2GOcbQfbr5vrhazQkQNqdUSNLRV2hGsIC2Ix05QjPAgrRKEXWCTku5B8CMA32NTqiiUnRCzFC49YNLzTINPt4165SbK2N_-ejK5_q42Ro7Nre-TCY0P9w2u-LiaEaf4mO33svuvNOt4uhC8HcVqL0QTJ9yJf-4ZhFNeBi9LW_Q8dqE4t4e6hn6efHlZvmtvbz-ulouLltLORlbNyjeOd5LTKXBhjJGiKwVE6l4bzEo6LDEagBpqLWKWmXBSCFhINArQs_Q-X7uNqffkyuj3vhi63YmujQVXd3pOAMJvKIfn6H3acp14aIpCCZBAGeV-nCgpn7jBr3NfmPyg_7nYwXIHrA5lZLd-j-CQe_C0o9h6V1Y-hBWFalnIuv35o7Z-PCy9P1e6p1zT_5iDDgh9C8TlKDx
CODEN ITPIDJ
CitedBy_id crossref_primary_10_1186_s13640_024_00647_y
crossref_primary_10_1109_JETCAS_2024_3524260
crossref_primary_10_1109_JSAC_2024_3460078
crossref_primary_10_1109_ACCESS_2025_3549316
crossref_primary_10_1038_s41598_025_85602_1
crossref_primary_10_1109_TCSVT_2024_3467124
Cites_doi 10.1145/3394171.3413968
10.1109/CVPR46437.2021.00991
10.1109/ICIP40778.2020.9191184
10.1109/TCSVT.2021.3104305
10.1109/ICCV.2017.244
10.1109/ICME51207.2021.9428417
10.1109/CVPR.2016.90
10.1109/DCC50243.2021.00024
10.1145/3343031.3350874
10.5555/2969033.2969125
10.1109/TIP.2019.2941660
10.1109/ICIP40778.2020.9191247
10.1109/ICIP40778.2020.9190860
10.1109/ICME46284.2020.9102843
10.1109/cvprw53098.2021.00271
10.1109/MSP.2014.2371951
10.1109/TIP.2022.3160602
10.1109/ICASSP40776.2020.9053011
10.1109/ICME46284.2020.9102750
10.1109/TMM.2021.3094300
10.1109/ICIP.2019.8803255
10.1109/DCC50243.2021.00057
10.1109/TMM.2020.2966885
10.1109/TMM.2021.3068580
10.1109/DCC47342.2020.00044
10.1109/ICASSP.2019.8682641
10.1109/CVPRW50498.2020.00088
10.1109/TIP.2021.3060875
10.1109/ICME51207.2021.9428224
10.1109/ICASSP40776.2020.9054770
10.1109/TIP.2020.3016485
10.1109/TCSVT.2003.815165
10.1109/ICME51207.2021.9428366
10.1109/ICASSP39728.2021.9413603
10.1109/CVPR42600.2020.01013
10.1007/978-3-319-10602-1_48
10.1109/TCSVT.2012.2221191
10.1109/ICASSP39728.2021.9413943
10.1007/978-3-030-58565-5_19
10.1109/VCIP49819.2020.9301807
10.1145/3343031.3350849
10.1145/1274871.1274888
10.1109/ICASSP40776.2020.9054527
10.1109/ICME51207.2021.9428228
10.1109/CVPR46437.2021.01641
10.1109/ICIP.2019.8803275
10.1109/ICIP.2019.8803110
10.1109/ICIP40778.2020.9190843
10.1002/047174882x
10.1109/ICME51207.2021.9428258
10.1109/ICASSP.2019.8683541
10.1109/ICIP40778.2020.9190933
10.1109/TPAMI.2021.3054719
10.1109/CVPR42600.2020.00813
10.1109/CVPR.2016.91
10.1109/JIOT.2020.3039359
10.1109/ICASSP39728.2021.9414465
10.1109/ICME46284.2020.9102810
10.1109/CVPR42600.2020.00271
10.1109/ICIP.2019.8803805
10.1109/ICCV.2015.169
10.1109/CVPR42600.2020.00796
10.1007/s11263-021-01491-7
10.1109/ICASSP39728.2021.9413506
10.1109/CVPR.2018.00391
10.1109/ICASSP40776.2020.9054165
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TPAMI.2024.3367293
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
Technology Research Database
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2160-9292
1939-3539
EndPage 5191
ExternalDocumentID 38376966
10_1109_TPAMI_2024_3367293
10440522
Genre orig-research
Journal Article
GrantInformation_xml – fundername: Fuzhou Chengtou New Infrastructure Group
– fundername: Boyun Vision Company Ltd.
– fundername: National Natural Science Foundation of China
  grantid: 62332010; 62088102
  funderid: 10.13039/501100001809
– fundername: PKU-NTU Joint Research Institute
– fundername: AI Joint Lab of Future Urban Infrastructure
– fundername: Ng Teng Fong Charitable Foundation
  funderid: 10.13039/501100018807
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
9M8
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
ADRHT
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
FA8
HZ~
H~9
IBMZZ
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TN5
UHB
VH1
XJT
~02
AAYOK
AAYXX
CITATION
RIG
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c352t-ed859e5b7137a1a3442271a312785bc108091718d07a3cc83c8c0a7670d20b823
IEDL.DBID RIE
ISSN 0162-8828
1939-3539
IngestDate Fri Jul 11 10:36:03 EDT 2025
Sun Jun 29 12:14:58 EDT 2025
Thu Apr 03 07:00:53 EDT 2025
Tue Jul 01 01:43:09 EDT 2025
Thu Apr 24 22:51:59 EDT 2025
Wed Aug 27 02:06:04 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c352t-ed859e5b7137a1a3442271a312785bc108091718d07a3cc83c8c0a7670d20b823
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-4919-4515
0000-0002-1480-7388
0000-0002-4491-2023
0000-0002-0468-9576
0000-0002-1692-0069
PMID 38376966
PQID 3064706054
PQPubID 85458
PageCount 18
ParticipantIDs ieee_primary_10440522
crossref_primary_10_1109_TPAMI_2024_3367293
pubmed_primary_38376966
proquest_miscellaneous_2929540705
proquest_journals_3064706054
crossref_citationtrail_10_1109_TPAMI_2024_3367293
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-07-01
PublicationDateYYYYMMDD 2024-07-01
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-07-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationTitleAlternate IEEE Trans Pattern Anal Mach Intell
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref57
ref12
ref56
ref15
ref59
ref14
ref58
ref53
ref11
Simonyan (ref52)
ref55
ref10
ref54
ref17
ref19
ref18
Bellard (ref7) 2021
ref51
ref50
ref46
ref45
ref48
ref47
ref42
ref41
ref44
Ballé (ref6)
ref49
ref8
ref9
ref4
ref3
ref5
Krizhevsky (ref36)
ref35
ref34
ref37
ref31
ref75
ref30
ref74
ref33
ref32
ref76
ref2
ref1
ref39
ref38
Chun (ref16)
Locatello (ref43)
ref71
ref70
ref73
ref72
ref24
ref68
ref23
ref67
Lin (ref40)
ref25
ref69
ref20
ref64
ref63
ref66
ref21
ref65
ref28
ref27
ref29
(ref26) 2021
Gao (ref22) 2021
ref60
ref62
ref61
References_xml – ident: ref71
  doi: 10.1145/3394171.3413968
– ident: ref63
  doi: 10.1109/CVPR46437.2021.00991
– ident: ref69
  doi: 10.1109/ICIP40778.2020.9191184
– ident: ref23
  doi: 10.1109/TCSVT.2021.3104305
– ident: ref76
  doi: 10.1109/ICCV.2017.244
– ident: ref32
  doi: 10.1109/ICME51207.2021.9428417
– start-page: 10129
  volume-title: Proc. AAAI Conf. Artif. Intell.
  ident: ref40
  article-title: Enhancing unsupervised video representation learning by decoupling the scene and the motion
– ident: ref27
  doi: 10.1109/CVPR.2016.90
– ident: ref8
  doi: 10.1109/DCC50243.2021.00024
– ident: ref39
  doi: 10.1145/3343031.3350874
– ident: ref25
  doi: 10.5555/2969033.2969125
– ident: ref14
  doi: 10.1109/TIP.2019.2941660
– ident: ref48
  doi: 10.1109/ICIP40778.2020.9191247
– ident: ref54
  doi: 10.1109/ICIP40778.2020.9190860
– ident: ref66
  doi: 10.1109/ICME46284.2020.9102843
– ident: ref46
  doi: 10.1109/cvprw53098.2021.00271
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref6
  article-title: Variational image compression with a scale hyperprior
– ident: ref45
  doi: 10.1109/MSP.2014.2371951
– year: 2021
  ident: ref26
  article-title: Draft of white paper on motivation and requirements for video coding for machine
– ident: ref17
  doi: 10.1109/TIP.2022.3160602
– ident: ref18
  doi: 10.1109/ICASSP40776.2020.9053011
– ident: ref30
  doi: 10.1109/ICME46284.2020.9102750
– ident: ref62
  doi: 10.1109/TMM.2021.3094300
– ident: ref60
  doi: 10.1109/ICIP.2019.8803255
– ident: ref49
  doi: 10.1109/DCC50243.2021.00057
– start-page: 1106
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref36
  article-title: ImageNet classification with deep convolutional neural networks
– ident: ref44
  doi: 10.1109/TMM.2020.2966885
– ident: ref70
  doi: 10.1109/TMM.2021.3068580
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref52
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref68
  doi: 10.1109/DCC47342.2020.00044
– ident: ref11
  doi: 10.1109/ICASSP.2019.8682641
– ident: ref28
  doi: 10.1109/CVPRW50498.2020.00088
– ident: ref53
  doi: 10.1109/ICIP40778.2020.9190860
– ident: ref4
  doi: 10.1109/TIP.2021.3060875
– ident: ref38
  doi: 10.1109/ICME51207.2021.9428224
– start-page: 4114
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref43
  article-title: Challenging common assumptions in the unsupervised learning of disentangled representations
– ident: ref3
  doi: 10.1109/ICASSP40776.2020.9054770
– ident: ref21
  doi: 10.1109/TIP.2020.3016485
– ident: ref64
  doi: 10.1109/TCSVT.2003.815165
– ident: ref10
  doi: 10.1109/ICME51207.2021.9428366
– ident: ref58
  doi: 10.1109/ICASSP39728.2021.9413603
– ident: ref29
  doi: 10.1109/CVPR42600.2020.01013
– ident: ref41
  doi: 10.1007/978-3-319-10602-1_48
– ident: ref55
  doi: 10.1109/TCSVT.2012.2221191
– ident: ref5
  doi: 10.1109/ICASSP39728.2021.9413943
– ident: ref19
  doi: 10.1007/978-3-030-58565-5_19
– ident: ref31
  doi: 10.1109/VCIP49819.2020.9301807
– ident: ref13
  doi: 10.1145/3343031.3350849
– ident: ref47
  doi: 10.1145/1274871.1274888
– ident: ref51
  doi: 10.1109/ICASSP40776.2020.9054527
– ident: ref33
  doi: 10.1109/ICME51207.2021.9428228
– ident: ref67
  doi: 10.1109/CVPR46437.2021.01641
– ident: ref56
  doi: 10.1109/ICIP.2019.8803275
– ident: ref2
  doi: 10.1109/ICIP.2019.8803110
– ident: ref12
  doi: 10.1109/ICIP40778.2020.9190843
– ident: ref20
  doi: 10.1002/047174882x
– ident: ref75
  doi: 10.1109/ICME51207.2021.9428258
– ident: ref1
  doi: 10.1109/ICASSP.2019.8683541
– ident: ref57
  doi: 10.1109/ICIP40778.2020.9190933
– ident: ref59
  doi: 10.1109/TPAMI.2021.3054719
– start-page: 7936
  volume-title: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. Workshops
  ident: ref16
  article-title: Learned prior information for image compression
– ident: ref34
  doi: 10.1109/CVPR42600.2020.00813
– ident: ref50
  doi: 10.1109/CVPR.2016.91
– ident: ref74
  doi: 10.1109/JIOT.2020.3039359
– year: 2021
  ident: ref7
  article-title: BPG image format
– ident: ref37
  doi: 10.1109/ICASSP39728.2021.9414465
– ident: ref65
  doi: 10.1109/ICME46284.2020.9102810
– ident: ref72
  doi: 10.1109/CVPR42600.2020.00271
– ident: ref9
  doi: 10.1109/ICIP.2019.8803805
– ident: ref24
  doi: 10.1109/ICCV.2015.169
– ident: ref15
  doi: 10.1109/CVPR42600.2020.00796
– ident: ref42
  doi: 10.1007/s11263-021-01491-7
– ident: ref61
  doi: 10.1109/ICASSP39728.2021.9413506
– year: 2021
  ident: ref22
  article-title: Recent Standard Development Activities on Video Coding for Machines. arXiv e-prints
– ident: ref73
  doi: 10.1109/CVPR.2018.00391
– ident: ref35
  doi: 10.1109/ICASSP40776.2020.9054165
SSID ssj0014503
Score 2.5258112
Snippet As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, V ideo C oding for M achines ( VCM...
As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 5174
SubjectTerms analytics taxonomy
codebook-hyperprior
Codec
Coding
compact visual representation
Data compression
Encoding
Feature extraction
Image coding
Image compression
Labels
Machine vision
multiple tasks
Neural networks
Optimization
Representations
Task analysis
Taxonomy
Video coding
Video coding for machines
Video compression
Vision systems
Visual tasks
Title Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics
URI https://ieeexplore.ieee.org/document/10440522
https://www.ncbi.nlm.nih.gov/pubmed/38376966
https://www.proquest.com/docview/3064706054
https://www.proquest.com/docview/2929540705
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS-1ADA7qShe-H70-GMGd9NjTmZ7OuJODosIRERV3pZ3OAVFa8bR3cX_9TdKHIiiuWminnZJkkq-TfAE40pkxU2qTisGB9pVFWRj0A76dojvQqGAhc3dObkaXD-r6KXpqi9W5FsY5x8lnbkCnvJefl7amX2Vo4UrR6HmYR-TWFGv1WwYq4jbIGMKgiSOO6CpkAnNyf3s2uUIsGKqBlCMMJ6l7DkGzkWF2xA-HxB1Wvg822elcrMBNN90m1-RlUFfZwP77wuT46-9ZheU2_BRnjb6swZwr1mGla-0gWktfh6VPPIUb8Pr4nLtSjEtydALDXDHhHEw3OxW8oNhKPD7PanzyHWfWtgVNBV9tMm0LHnfVM4BWYvyhgn-dYHYU4ozehIeL8_vxpd-2afAtRm-V73IdGRdlCHfjdJhKpcIwxuMwjHWUWUpiREw41HkQp9JaLa22QRqP4iAPg0yHcgsWirJwOyCkjJ1xakoLi8LYJU2zFDGWlXmQ20BNPRh2skpsy2FOrTReE8YygUlY1AmJOmlF7cFxP-atYfD48e5NktOnOxsRebDX6UTSWvksIfRG7EOR8uCwv4z2SZsuaeHKepaEhnZScWGNPNhudKl_eKeCf7556S4s0tya7OA9WKjea7ePMVCVHbDu_wcf8v09
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fb9QwDLZgPAAPDMaAwoAg8YZ6a5v0mvA2nZjuYHdC6DbtLWrTnDQxtdOu5YG_Htv9sQlpiKdWatOmsh1_buzPAB91YcyG2qQiONChcigLg34gdBt0BxoVLGHuzuVqOj9VX8_T875YnWthvPecfOYndMp7-WXtWvpVhhauFI2-Dw_Q8adxV641bhqolBshI4hBI8dIYqiRiczh-vvRcoHRYKImUk4RUFL_HArOpob5EW9cEvdYuRtusts53oXVMOEu2-TnpG2Kifv9F5fjf3_RU3jSA1Bx1GnMM7jnqz3YHZo7iN7W9-DxLabC53B5dlH6WsxqcnUCga5Ycham334WvKS4RpxdbFt88g_Ore1Lmiq-2uXaVjxuMXKANmJ2o4S_vGB-FGKN3ofT4y_r2TzsGzWEDvFbE_pSp8anBQa8WR7nUqkkyfAYJ5lOC0dpjBgVxrqMslw6p6XTLsqzaRaVSVToRL6Anaqu_CsQUmbeeLWhpUUhesnzIscoy8kyKl2kNgHEg6ys61nMqZnGpeVoJjKWRW1J1LYXdQCfxjFXHYfHP-_eJzndurMTUQAHg07Y3s63luI34h9KVQAfxstoobTtkle-brc2MbSXiktrGsDLTpfGhw8q-PqOl76Hh_P18sSeLFbf3sAjmmeXK3wAO811698iImqKd2wHfwB6JwCV
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Video+Coding+for+Machines%3A+Compact+Visual+Representation+Compression+for+Intelligent+Collaborative+Analytics&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Yang%2C+Wenhan&rft.au=Huang%2C+Haofeng&rft.au=Hu%2C+Yueyu&rft.au=Ling-Yu%2C+Duan&rft.date=2024-07-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0162-8828&rft.eissn=1939-3539&rft.volume=46&rft.issue=7&rft.spage=5174&rft_id=info:doi/10.1109%2FTPAMI.2024.3367293&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon