Human-Centric Depth Estimation: A Hybrid Approach with Minimal Data

This study presents a novel system for accurate camera-to-person distance estimation in CCTV environments. To address the limitations of existing approaches—which often require extensive training data and lack object-level precision—we propose a hybrid framework that integrates SAM’s zero-shot segme...

Full description

Saved in:
Bibliographic Details
Published inElectronics (Basel) Vol. 14; no. 11; p. 2283
Main Authors Kim, Yuhyun, Ahn, Heejin, Kim, Taeseop, Ahn, Byungtae, Choi, Dong-Geol
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 04.06.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study presents a novel system for accurate camera-to-person distance estimation in CCTV environments. To address the limitations of existing approaches—which often require extensive training data and lack object-level precision—we propose a hybrid framework that integrates SAM’s zero-shot segmentation with monocular depth estimation. Our method isolates human subjects from complex backgrounds and incorporates Kernel Density Estimation (KDE), log-space learning, and linear residual blocks to improve prediction accuracy. This approach is designed to resolve the non-linear mapping between visual features and metric distances. Evaluations on a custom dataset demonstrate a mean absolute error (MAE) of 0.65 m on 1612 test images, using only 30 training samples. Notably, the use of SAM for fine-grained segmentation significantly outperforms conventional bounding box methods, reducing the MAE from 0.82 m to 0.65 m. The proposed system offers immediate applicability to security surveillance and disaster response scenarios, with its minimal data requirements enhancing its practical deployability.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics14112283