DSPU: An Efficient Deep Learning-Based Dense RGB-D Data Acquisition With Sensor Fusion and 3-D Perception SoC

3-D red, green, blue, and depth (RGB-D) and 3-D perception are essential information for 3-D applications such as autonomous driving and augmented reality (AR)/virtual reality (VR) systems. However, battery- and resource-limited mobile devices face difficulties in obtaining dense RGB-D data and 3-D...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of solid-state circuits Vol. 58; no. 1; pp. 177 - 188
Main Authors Im, Dongseok, Park, Gwangtae, Ryu, Junha, Li, Zhiyong, Kang, Sanghoon, Han, Donghyeon, Lee, Jinsu, Park, Wonhoon, Kwon, Hankyul, Yoo, Hoi-Jun
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:3-D red, green, blue, and depth (RGB-D) and 3-D perception are essential information for 3-D applications such as autonomous driving and augmented reality (AR)/virtual reality (VR) systems. However, battery- and resource-limited mobile devices face difficulties in obtaining dense RGB-D data and 3-D perception information in low-power (LP) and real-time. Specifically, an RGB-D sensor is used to acquire 3-D RGB-D data, but it consumes high power and produces sparse depth data. Moreover, preprocessing for RGB-D data requires a long execution time. Previous 3-D perception accelerators also have limited reconfigurability, making them incapable of executing diverse 3-D perception tasks. In this article, an LP and real-time depth signal processing system-on-chip (SoC), depth signal processing unit (DSPU), is presented. The DSPU produces accurate dense RGB-D data using a convolutional neural network (CNN)-based monocular depth estimation (MDE) and a sensor fusion with an LP ToF sensor. Then, the DSPU performs 3-D perception inferring a point cloud-based neural network (PNN). The DSPU executes the depth signal processing system with the following features: 1) a unified point processing unit (UPPU) with a flexible window based-search algorithm for simplifying the complexity of point processing algorithms and saving the arithmetic units and buffers; 2) a unified matrix processing unit (UMPU) with bit-slice-level sparsity exploitation to accelerate various matrix processing algorithms; 3) a band matrix encoder and decoder to decrease the data transactions in the conjugate-gradient (C-Grad) method; and 4) a point feature (PF) reuse method with a pipelined architecture for low-latency and LP PNN inference. Finally, the DSPU achieves real-time implementation with 281.6 mW of the end-to-end 3-D B-box extraction system.
ISSN:0018-9200
1558-173X
DOI:10.1109/JSSC.2022.3218278