Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images

Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of selected topics in applied earth observations and remote sensing Vol. 15; pp. 4371 - 4385
Main Authors	Zhang, Xiangqing, Feng, Yan, Zhang, Shun, Wang, Nan, Mei, Shaohui
Format	Journal Article
Language	English
Published	Piscataway IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Altitude Bottleneck attention mechanism (BAM) Crops Datasets densely cropped Detection Detectors Drone aircraft Feature extraction Image resolution Imagery Low altitude Object detection Object recognition Optimization Satellite imagery small object detection Spaceborne remote sensing Task analysis Training Unmanned aerial vehicles VisDrone2019 datasets YOLOv5
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Finding tiny persons under the drone vision was, is, and remains to be an integral and challenging task. Unmanned aerial vehicles (UAVs) with high-speed, low-altitude, and multi-perspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in high-resolution aerial images. In this article, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet-assembled Density Map-Guided Object Detection (DMNet) in aerial images and You Only Look Twice (YOLT): Rapid Multiscale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added bottleneck attention mechanism to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering, and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performanc; the detection result of person category <inline-formula><tex-math notation="LaTeX">A P^{\text{val }}</tex-math></inline-formula>@0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method (DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, <inline-formula><tex-math notation="LaTeX">A P^{\text{val }}</tex-math></inline-formula>@0.5 and <inline-formula><tex-math notation="LaTeX">A P^{\text{test }}</tex-math></inline-formula>@0.5 obtained 74.95% and 62.18%, respectively. Compared to YOLOv5, DCLANet improves 3.8% or so, which is encouraging and competitive.
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2022.3175498