Sampling Equivariant Self-Attention Networks for Object Detection in Aerial Images

Objects in aerial images show greater variations in scale and orientation than in other images, making them harder to detect using vanilla deep convolutional neural networks. Networks with sampling equivariance can adapt sampling from input feature maps to object transformation, allowing a convoluti...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 32; pp. 6413 - 6425
Main Authors	Yang, Guo-Ye, Li, Xiang-Li, Xiao, Zi-Kai, Mu, Tai-Jiang, Martin, Ralph R., Hu, Shi-Min
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	aerial images Artificial neural networks Computational modeling Convolution Deformation effects Feature extraction Feature maps Formability Kernel Modules Object detection Object recognition Sampling Sampling equivariance self-attention Training Transformations Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Objects in aerial images show greater variations in scale and orientation than in other images, making them harder to detect using vanilla deep convolutional neural networks. Networks with sampling equivariance can adapt sampling from input feature maps to object transformation, allowing a convolutional kernel to extract effective object features under different transformations. However, methods such as deformable convolutional networks can only provide sampling equivariance under certain circumstances, as they sample by location. We propose sampling equivariant self-attention networks, which treat self-attention restricted to a local image patch as convolution sampling by masks instead of locations, and a transformation embedding module to improve the equivariant sampling further. We further propose a novel randomized normalization module to enhance network generalization and a quantitative evaluation metric to fairly evaluate the ability of sampling equivariance of different models. Experiments show that our model provides significantly better sampling equivariance than existing methods without additional supervision and can thus extract more effective image features. Our model achieves state-of-the-art results on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets without additional computations or parameters.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2023.3327586