Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks

Depth estimation from single monocular images is a key component in scene understanding. Most existing algorithms formulate depth estimation as a regression problem due to the continuous property of depths. However, the depth value of input data can hardly be regressed exactly to the ground-truth va...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 28; no. 11; pp. 3174 - 3182
Main Authors	Cao, Yuanzhouhan, Wu, Zifeng, Shen, Chunhua
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Bins Classification deep residual networks depth estimation Estimation Image classification Network architecture Neural networks Performance enhancement Post-processing Predictive models Probability distribution Scene analysis Semantics State of the art Statistical analysis Test procedures Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Depth estimation from single monocular images is a key component in scene understanding. Most existing algorithms formulate depth estimation as a regression problem due to the continuous property of depths. However, the depth value of input data can hardly be regressed exactly to the ground-truth value. In this paper, we propose to formulate depth estimation as a pixelwise classification task. Specifically, we first discretize the continuous ground-truth depths into several bins and label the bins according to their depth ranges. Then, we solve the depth estimation problem as classification by training a fully convolutional deep residual network. Compared with estimating the exact depth of a single point, it is easier to estimate its depth range. More importantly, by performing depth classification instead of regression, we can easily obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we can apply an information gain loss to make use of the predictions that are close to ground-truth during training, as well as fully-connected conditional random fields for post-processing to further improve the performance. We test our proposed method on both indoor and outdoor benchmark RGB-Depth datasets and achieve state-of-the-art performance.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2017.2740321