Image Retrieval Using Object Semantic Aggregation Histogram

Simulating primates’ ability to make fine visual discriminations for extracting visual features remains a challenge. To address this issue, a novel method was proposed, which was named the object semantic aggregation histogram . By using the developed method, the mid-level object features and high-l...

Full description

Saved in:
Bibliographic Details
Published inCognitive computation Vol. 15; no. 5; pp. 1736 - 1747
Main Authors Lu, Fen, Liu, Guang-Hai
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Simulating primates’ ability to make fine visual discriminations for extracting visual features remains a challenge. To address this issue, a novel method was proposed, which was named the object semantic aggregation histogram . By using the developed method, the mid-level object features and high-level semantic features can be aggregated into a discriminative compact representation for image retrieval. The proposed method includes three main highlights: (1) two adaptive semantic kernels were proposed to bridge the object features and semantic features. It can serve as a connecting link for detailed object representation. (2) Both object and semantic features were aggregated to provide a discriminative representation by depicting the target objects, semantic cognition, and spatial layouts. This approach is consistent with the coarse-to-fine nature of perception in the visual hierarchy. (3) A simple yet generic method was introduced to implement dimensionality reduction and whitening. It can provide a good choice of using various regularizations to decide optimal compact representation. Experiments on benchmark datasets confirmed that the proposed method can effectively improve the retrieval performance in terms of mAP. The mAPs of the proposed method using 128-dimensionality representation were significantly greater than that of the CroW, SBA, DSFH, and DTFH methods by 0.042, 0.035, 0.061, and 0.029 on the Oxford5k dataset and by 0.019, 0.017, 0.083, and 0.034 on the Holidays dataset. The proposed method is regarded as an effective and competitive method of aggregating multiple visual hierarchy features, while no complex handcrafting or training is required.
ISSN:1866-9956
1866-9964
DOI:10.1007/s12559-023-10143-6