RD-NERF: Neural Robust Distilled Feature Fields for Sparse-View Scene Segmentation

We propose Neural Robust Distilled Feature Fields (RD-NeRF) for achieving robust 3D semantic feature distillation and 3D consistent scene segmentation with sparse-view labels. Specifically, we introduce a two-stage pipeline. In the distillation stage, we employ the pre-trained image feature extracto...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 3470 - 3474
Main Authors Ma, Yongjia, Dou, Bin, Zhang, Tianyu, Yuan, Zejian
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose Neural Robust Distilled Feature Fields (RD-NeRF) for achieving robust 3D semantic feature distillation and 3D consistent scene segmentation with sparse-view labels. Specifically, we introduce a two-stage pipeline. In the distillation stage, we employ the pre-trained image feature extractor, DINO-ViT, as the teacher network. RD-NeRF distills semantic knowledge into 3D space and utilizes the Vector-Matrix (VM) tensor decomposition method to represent semantic field with volumetric rendering. For the training process, we utilize the distance-wise and angle-wise distillation loss. This enables the student network to capture high-level semantics, enhance scene reconstruction and segmentation performance, and improve robustness and effectiveness in distillation. In the segmentation stage, hash features and distilled semantic features are inputs for the segmentation MLP, which is supervised by the sparse-view labels. The experimental results demonstrate that our model performs well in 3D-consistent scene segmentation under sparse-view supervision.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10447068