A Small Target Detection Model Based on an Improved RT-DETR

Object detection, as one of the core tasks in computer vision, has been a focus of research in recent decades, thanks to the rapid development of deep learning technology. From R-CNN, YOLO, to SSD, RetinaNet, and DETR with Transformer, various algorithms have been developed at a rapid pace, includin...

Full description

Saved in:

Bibliographic Details
Published in	2024 4th International Conference on Industrial Automation, Robotics and Control Engineering (IARCE) pp. 434 - 438
Main Authors	Liu, Ruoyuan, Zhang, Xizheng, Jin, Shengwei, Wang, Qing, Zeng, Lijing, Liao, Junyu
Format	Conference Proceeding
Language	English
Published	IEEE 15.11.2024
Subjects	Computational modeling Feature extraction Mobile handsets Object detection RT-DETR Semantics Service robots small target detection Three-dimensional displays Transformer cores Transformers YOLO
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Object detection, as one of the core tasks in computer vision, has been a focus of research in recent decades, thanks to the rapid development of deep learning technology. From R-CNN, YOLO, to SSD, RetinaNet, and DETR with Transformer, various algorithms have been developed at a rapid pace, including lightweight, end-to-end, 3D object detection, and small object, cross-modal detection, etc., which have achieved good detection results on various public datasets. However, the task of detecting small objects has not seen significant breakthroughs due to the fact that small objects occupy fewer pixels in images and often lack sufficient semantic features. In real-world scenarios, they are often subject to issues such as lighting, blurring, and obstacles, making it difficult to distinguish and detect them. The application areas of small object detection are very rich, especially on mobile devices, where balancing detection performance and model parameters is a challenging problem. In this paper, we adopt the RT-DETR model and replace the main branch's Basic Block with FasterNet Block to reduce the model's parameter count and computational cost, making it more suitable for deployment on mobile devices. We also fuse the output of P2 feature layer through the SPDConv module with the output of P3 feature layer to enhance the model's ability to extract features of small objects. The experimental results show that the improved model achieves good results on the Visdrone2019 drone dataset, especially in the performance of detecting small objects.
DOI:	10.1109/IARCE64300.2024.00086