Transformer-Based Multi-layer Feature Aggregation and Rotated Anchor Matching for Oriented Object Detection in Remote Sensing Images

Object detection has made significant progress in computer vision. However, challenges remain in detecting small, arbitrarily oriented, and densely distributed objects, especially in aerial remote sensing images. This paper presents MATDet, an end-to-end encoder-decoder detection network based on th...

Full description

Saved in:
Bibliographic Details
Published inArabian journal for science and engineering (2011) Vol. 49; no. 9; pp. 12935 - 12951
Main Authors Jin, Chuan, Zheng, Anqi, Wu, Zhaoying, Tong, Changqing
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Object detection has made significant progress in computer vision. However, challenges remain in detecting small, arbitrarily oriented, and densely distributed objects, especially in aerial remote sensing images. This paper presents MATDet, an end-to-end encoder-decoder detection network based on the Transformer designed for oriented object detection. The network employs multi-layer feature aggregation and rotated anchor matching methods to improve oriented small and densely distributed object detection accuracy. Specifically, the encoder is responsible for encoding labeled image blocks using convolutional neural network (CNN) feature maps. It efficiently fuses these blocks with higher resolution multi-scale features through cross-layer connections, facilitating the extraction of global contextual information. The decoder then performs an upsampling of the encoded features, effectively recovering the full spatial resolution of the feature maps to capture essential local–global semantic features for accurate object localization. In addition, high quality proposed anchor boxes are generated by refined convolution, and the convolved features are adaptively aligned according to the anchor boxes to reduce redundant computation. The proposed MATDet achieves mAPs of 80.35%, 78.83%, 73.60%, and 98.01% on the DOTAv1.0, DOTAv1.5, DIOR, and HRSC2016 datasets, respectively, proving that it outperforms the baseline model for oriented object detection. This validation confirms the feasibility and effectiveness of the proposed methods.
ISSN:2193-567X
1319-8025
2191-4281
DOI:10.1007/s13369-024-08892-z