LUMDE: Light-Weight Unsupervised Monocular Depth Estimation via Knowledge Distillation

The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth estimation networks...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 12; no. 24; p. 12593
Main Authors	Hu, Wenze, Dong, Xue, Liu, Ning, Chen, Yuanfeng
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.12.2022
Subjects	Knowledge knowledge distillation (KD) Learning Neural networks pose network Semantics Transformer unsupervised depth estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth estimation networks have been reported previously, such as Monodepth2 and SC-SfMLearner, most of these approaches are still computationally expensive for lightweight devices. Therefore, in this paper, we introduced a knowledge-distillation-based approach named LUMDE, to deal with the pixel-by-pixel unsupervised monocular depth estimation task. Specifically, we use a teacher network and lightweight student network to distill the depth information, and further, integrate a pose network into the student module to improve the depth performance. Moreover, referring to the idea of the Generative Adversarial Network (GAN), the outputs of the student network and teacher network are taken as fake and real samples, respectively, and Transformer is introduced as the discriminator of GAN to further improve the depth prediction results. The proposed LUMDE method achieves state-of-the-art (SOTA) results in the knowledge distillation of unsupervised depth estimation and also outperforms the results of some dense networks. The proposed LUMDE model only loses 2.6% on δ1 accuracy on the NYUD-V2 dataset compared with the teacher network but reduces the computational complexity by 95.2%.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app122412593