Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation

Semantic segmentation and single-view depth estimation are two fundamental problems in computer vision. They exploit the semantic and geometric properties of images, respectively, and are thus complementary in scene understanding. In this paper, we propose a collaborative deconvolutional neural netw...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 11; pp. 5655 - 5666
Main Authors	Liu, Jing, Wang, Yuhang, Li, Yong, Fu, Jun, Li, Jiangyun, Lu, Hanqing
Format	Journal Article
Language	English
Published	United States IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Benchmarks Classification Collaboration Computer vision Deconvolutional neural network (DCNN) depth estimation Entropy (Information theory) Estimation Feature maps fully connected conditional random field (CRF) Fuses Image processing Image reconstruction Image resolution Image segmentation Labeling Neural networks Performance enhancement pointwise bilinear layer Scene analysis Semantic segmentation Semantics soft mapping strategy Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Semantic segmentation and single-view depth estimation are two fundamental problems in computer vision. They exploit the semantic and geometric properties of images, respectively, and are thus complementary in scene understanding. In this paper, we propose a collaborative deconvolutional neural network (C-DCNN) to jointly model these two problems for mutual promotion. The C-DCNN consists of two DCNNs, of which each is for one task. The DCNNs provide a finer resolution reconstruction method and are pretrained with hierarchical supervision. The feature maps from these two DCNNs are integrated via a pointwise bilinear layer, which fuses the semantic and depth information and produces higher order features. Then, the integrated features are fed into two sibling classification layers to simultaneously learn for semantic segmentation and depth estimation. In this way, we combine the semantic and depth features in a unified deep network and jointly train them to benefit each other. Specifically, during network training, we process depth estimation as a classification problem where a soft mapping strategy is proposed to map the continuous depth values into discrete probability distributions and the cross entropy loss is used. Besides, a fully connected conditional random field is also used as postprocessing to further improve the performance of semantic segmentation, where the proximity relations of pixels on position, intensity, and depth are jointly considered. We evaluate our approach on two challenging benchmarks: NYU Depth V2 and SUN RGB-D. It is demonstrated that our approach effectively utilizes these two kinds of information and achieves state-of-the-art results on both the semantic segmentation and depth estimation tasks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2017.2787781