Ray Reordering for Hardware-Accelerated Neural Volume Rendering

Neural Volume Rendering (NVR) has advanced explosively since the advent of Neural Radiance Field (NeRF), a technique for novel view synthesis of complex scenes based on a finite set of input views. Existing ray casting-based NVR approaches process rays concurrently to leverage parallelism but fails...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology p. 1
Main Authors	Ding, Junran, He, Yunxiang, Yuan, Binzhe, Yuan, Zhechen, Zhou, Pingqiang, Yu, Jingyi, Lou, Xin
Format	Journal Article
Language	English
Published	IEEE 26.06.2024
Subjects	Cache Locality Casting Hardware Hardware Accelerator Image color analysis Interpolation Neural networks Neural Volume Rendering (NVR) Parallel processing Ray Reordering Rendering (computer graphics)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Neural Volume Rendering (NVR) has advanced explosively since the advent of Neural Radiance Field (NeRF), a technique for novel view synthesis of complex scenes based on a finite set of input views. Existing ray casting-based NVR approaches process rays concurrently to leverage parallelism but fails to consider its impact on cache locality, which ultimately undermines the efficiency of corresponding dedicated hardware accelerator designs. We further observed that there exhibits spatial correspondence between features and voxels in NVR that can be exploited by processing in the order of voxel, not ray. This paper introduces a novel approach to meticulously reorder the execution of rays, ensuring that rays with similar memory access patterns are processed in parallel, thereby enhancing cache locality. On the basis of that, we also propose an efficient backend architecture and a corresponding memory subsystem, facilitating accurate data prefetching to hide off-chip memory latency. To validate the proposed architecture, we implement our design in VerilogHDL and evaluate the performance by post-synthesis simulation with real scene data. The evaluation results demonstrate that our design markedly enhances the efficiency of NVR processing, achieving a considerable speedup (1.62×) compared to the state-of-the-art NVR accelerator, while necessitating significantly less silicon area (5.12×) and power (32.79×).
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3419761