Human-Centric Depth Estimation: A Hybrid Approach with Minimal Data
This study presents a novel system for accurate camera-to-person distance estimation in CCTV environments. To address the limitations of existing approaches—which often require extensive training data and lack object-level precision—we propose a hybrid framework that integrates SAM’s zero-shot segme...
Saved in:
Published in | Electronics (Basel) Vol. 14; no. 11; p. 2283 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
04.06.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study presents a novel system for accurate camera-to-person distance estimation in CCTV environments. To address the limitations of existing approaches—which often require extensive training data and lack object-level precision—we propose a hybrid framework that integrates SAM’s zero-shot segmentation with monocular depth estimation. Our method isolates human subjects from complex backgrounds and incorporates Kernel Density Estimation (KDE), log-space learning, and linear residual blocks to improve prediction accuracy. This approach is designed to resolve the non-linear mapping between visual features and metric distances. Evaluations on a custom dataset demonstrate a mean absolute error (MAE) of 0.65 m on 1612 test images, using only 30 training samples. Notably, the use of SAM for fine-grained segmentation significantly outperforms conventional bounding box methods, reducing the MAE from 0.82 m to 0.65 m. The proposed system offers immediate applicability to security surveillance and disaster response scenarios, with its minimal data requirements enhancing its practical deployability. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2079-9292 2079-9292 |
DOI: | 10.3390/electronics14112283 |