I$^2$VC: A Unified Framework for Intra- & Inter-frame Video Compression
Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalizat...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.05.2024
|
Online Access | Get full text |
Cover
Loading…
Summary: | Video compression aims to reconstruct seamless frames by encoding the motion
and residual information from existing frames. Previous neural video
compression methods necessitate distinct codecs for three types of frames
(I-frame, P-frame and B-frame), which hinders a unified approach and
generalization across different video contexts. Intra-codec techniques lack the
advanced Motion Estimation and Motion Compensation (MEMC) found in inter-codec,
leading to fragmented frameworks lacking uniformity. Our proposed Intra- &
Inter-frame Video Compression (I$^2$VC) framework employs a single
spatio-temporal codec that guides feature compression rates according to
content importance. This unified codec transforms the dependence across frames
into a conditional coding scheme, thus integrating intra- and inter-frame
compression into one cohesive strategy. Given the absence of explicit motion
data, achieving competent inter-frame compression with only a conditional codec
poses a challenge. To resolve this, our approach includes an implicit
inter-frame alignment mechanism. With the pre-trained diffusion denoising
process, the utilization of a diffusion-inverted reference feature rather than
random noise supports the initial compression state. This process allows for
selective denoising of motion-rich regions based on decoded features,
facilitating accurate alignment without the need for MEMC. Our experimental
findings, across various compression configurations (AI, LD and RA) and frame
types, prove that I$^2$VC outperforms the state-of-the-art perceptual learned
codecs. Impressively, it exhibits a 58.4% enhancement in perceptual
reconstruction performance when benchmarked against the H.266/VVC standard
(VTM). Official implementation can be found at https://github.com/GYukai/I2VC. |
---|---|
DOI: | 10.48550/arxiv.2405.14336 |