Multiscale features integration based multiple-in-single-out network for object detection

•Propose a solution that covers multiscale objects with a single-level feature map.•A feature integrated module is used for missing multiscale information solution.•Multiscale contexts are extracted to extend the scale range of receptive fields. The single-level feature map-based object detection ha...

Full description

Saved in:
Bibliographic Details
Published inImage and vision computing Vol. 135; p. 104714
Main Authors Yang, Kequan, Li, Jide, Dai, Songmin, Li, Xiaoqiang
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Propose a solution that covers multiscale objects with a single-level feature map.•A feature integrated module is used for missing multiscale information solution.•Multiscale contexts are extracted to extend the scale range of receptive fields. The single-level feature map-based object detection has been a challenging task due to the feature scale limitation. Therefore, enriching multiscale information of single-level features is considered a promising approach to deal with this challenge. Although most existing methods have attempted to augment the feature scale of single-level features, the detection performance is still unsatisfactory because these methods mine multiscale features only based on a one-level feature map. To address this problem, we propose a multiple-in-single-out network (MiSoNet) to integrate multiscale information from multilevel feature maps into a single-level feature map. To achieve this, MiSoNet’s key component is equipped with two cascaded modules: a multilevel feature integration module (MFIM) and a depthwise convolutional residual encoder (DWEncoder). Specifically, MFIM adaptively fuses features of inconsistent semantics and scales from multilevel feature maps. DWEncoder stacks several residual blocks with depthwise convolutions to extract multiscale contexts in the single feature map, which can further extend the scale range of the receptive fields. Extensive experiments are conducted on the Common Objects in Context (COCO) dataset, where the MiSoNet achieves a 41.0AP, which surpasses the YOLOF by 1.4AP with negligible computational overhead. Moreover, the MiSoNet, with fewer parameters and FLOPs, outperforms some advanced detectors based on the feature pyramid network.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2023.104714