E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Recently, the image-wise implicit neural representation of videos, NeRV, has gained popularity for its promising results and swift speed compared to regular pixel-wise implicit representations. However, the redundant parameters within the network structure can cause a large model size when scaling u...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.07.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, the image-wise implicit neural representation of videos, NeRV, has
gained popularity for its promising results and swift speed compared to regular
pixel-wise implicit representations. However, the redundant parameters within
the network structure can cause a large model size when scaling up for
desirable performance. The key reason of this phenomenon is the coupled
formulation of NeRV, which outputs the spatial and temporal information of
video frames directly from the frame index input. In this paper, we propose
E-NeRV, which dramatically expedites NeRV by decomposing the image-wise
implicit neural representation into separate spatial and temporal context.
Under the guidance of this new formulation, our model greatly reduces the
redundant model parameters, while retaining the representation ability. We
experimentally find that our method can improve the performance to a large
extent with fewer parameters, resulting in a more than $8\times$ faster speed
on convergence. Code is available at https://github.com/kyleleey/E-NeRV. |
---|---|
DOI: | 10.48550/arxiv.2207.08132 |