Lyolo: a lightweight object detection algorithm integrating label enhancement for high-quality prediction boxes

In the field of object detection, most researchers overlook the relationship between predicted bounding boxes and ground truth boxes. Moreover, the downsampling of conventional convolution reduces image resolution, often sacrificing some details and edge information, impacting the precise determinat...

Full description

Saved in:
Bibliographic Details
Published inPattern analysis and applications : PAA Vol. 28; no. 3
Main Authors Gao, Ruxin, Ling, Zhiyong, Wang, Chengyang, Li, Xiang, She, Jianmin, Liu, Qunpo
Format Journal Article
LanguageEnglish
Published London Springer London 01.09.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the field of object detection, most researchers overlook the relationship between predicted bounding boxes and ground truth boxes. Moreover, the downsampling of conventional convolution reduces image resolution, often sacrificing some details and edge information, impacting the precise determination of object positions. Meanwhile, the feature extraction capability of the backbone network in enhancement algorithms is crucial for the detection performance of the entire model. To address these issues, this paper proposes a high-quality prediction box based object detection algorithm LYOLO. It suppresses low-quality prediction boxes and enhances high-quality ones, devising a Label Enhancement (LE) strategy to effectively adjust the weights of positive and negative samples. Meanwhile, a lightweight downsampling method (Down) and a lightweight Feature Enhancement (FE) mechanism are designed. The former enlarges the receptive field to improve the model’s ability to determine object positions, and the latter further allocates feature weights to generate stronger feature representations for the backbone network. Experimental results on the VOC and COCO datasets demonstrate that LYOLO, across all sizes, performs exceptionally well. It achieves the highest accuracy with the lowest number of parameters and computational complexity while maintaining low latency. For example, LYOLOn achieves an of 82.0% on the VOC dataset with only 2.28M parameters. Compared to the baseline model YOLO11n, it reduces the number of parameters by 11.9% while improving by 3.0%. In comparison with YOLOv8n, YOLOv9t, and YOLOv10n, LYOLOn achieves improvements of 3.4%, 2.3%, and 3.1%, respectively. The code and datasets used in this article can be obtained from https://github.com/lingzhiy/LYOLO .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-025-01528-4