CSR-Net++: Rethinking Context Structure Representation Learning for Feature Matching

Seeking good feature correspondences between two remote sensing (RS) images is an essential and important problem in the fields of RS and photogrammetry. Traditional approaches often necessitate a predefined geometric transformation model or additional manually crafted descriptors, significantly con...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on geoscience and remote sensing Vol. 62; pp. 1 - 12
Main Authors	Chen, Xiaoxian, Chen, Jiaxuan
Format	Journal Article
Language	English
Published	New York IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Context Deep learning Deformation Estimation Geometric transformation Image matching image registration Learning Matching mismatch removal (MR) Photogrammetry Remote sensing remote sensing (RS) image matching Representation learning Representations Rivers Sensors Spatial discrimination learning Task analysis Versatility Visual discrimination learning Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Seeking good feature correspondences between two remote sensing (RS) images is an essential and important problem in the fields of RS and photogrammetry. Traditional approaches often necessitate a predefined geometric transformation model or additional manually crafted descriptors, significantly constraining the versatility. In this work, we adopt the recent context structure representation network (CSR-Net), which has shown promising performance in general feature matching problems, and propose modifications, named CSR-Net++, to overcome its main limitations. Specifically, CSR-Net is combined with a PointNet-like geometry estimator, which is sensitive to large deformations, for global preregistration. In addition, CSR-Net learns local consensus representation through a fixed-size grid, leading to limited space-aware capacities due to grid pixelwise max-pooling operations. To tackle the abovementioned limitations, we first introduce a pruning layer for matching guided by global consensus, as opposed to relying on a geometric estimator. In addition, for directly learning consensus representation from points, we propose a modified context structure representation (CSR) learning module including an independent spatial location stream and a stand-alone visual stream (VS). This decomposition separates local consensus into positional consensus and visual consensus. The proposed dual-stream representation learning not only avoids the introduction of grid anchors but also provides visual contextual priors. To demonstrate the robustness and versatility of our CSR-Net++, we conducted comprehensive experiments using diverse sets of real image pairs for general feature matching. The results demonstrate the superiority of our CSR-Net++ in most matching scenarios, achieving a 0.47%-4.70% improvement in F-score for multimodal images over existing leading methods.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2024.3431008