Feature Compression for Cloud-Edge Multimodal 3D Object Detection
Machine vision systems, which can efficiently manage extensive visual perception tasks, are becoming increasingly popular in industrial production and daily life. Due to the challenge of simultaneously obtaining accurate depth and texture information with a single sensor, multimodal data captured by...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
06.09.2024
|
Online Access | Get full text |
Cover
Loading…
Summary: | Machine vision systems, which can efficiently manage extensive visual
perception tasks, are becoming increasingly popular in industrial production
and daily life. Due to the challenge of simultaneously obtaining accurate depth
and texture information with a single sensor, multimodal data captured by
cameras and LiDAR is commonly used to enhance performance. Additionally,
cloud-edge cooperation has emerged as a novel computing approach to improve
user experience and ensure data security in machine vision systems. This paper
proposes a pioneering solution to address the feature compression problem in
multimodal 3D object detection. Given a sparse tensor-based object detection
network at the edge device, we introduce two modes to accommodate different
application requirements: Transmission-Friendly Feature Compression (T-FFC) and
Accuracy-Friendly Feature Compression (A-FFC). In T-FFC mode, only the output
of the last layer of the network's backbone is transmitted from the edge
device. The received feature is processed at the cloud device through a channel
expansion module and two spatial upsampling modules to generate multi-scale
features. In A-FFC mode, we expand upon the T-FFC mode by transmitting two
additional types of features. These added features enable the cloud device to
generate more accurate multi-scale features. Experimental results on the KITTI
dataset using the VirConv-L detection network showed that T-FFC was able to
compress the features by a factor of 6061 with less than a 3% reduction in
detection performance. On the other hand, A-FFC compressed the features by a
factor of about 901 with almost no degradation in detection performance. We
also designed optional residual extraction and 3D object reconstruction modules
to facilitate the reconstruction of detected objects. The reconstructed objects
effectively reflected details of the original objects. |
---|---|
DOI: | 10.48550/arxiv.2409.04123 |