Two‐stage partial image‐text clustering (TPIT‐C)

Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which resu...

Full description

Saved in:
Bibliographic Details
Published inIET computer vision Vol. 16; no. 8; pp. 694 - 708
Main Authors Guo, Dongjin, Su, Xiaoming, Lian, Yahong, Liu, Limin, Wang, Haibo
Format Journal Article
LanguageEnglish
Published Stevenage John Wiley & Sons, Inc 01.12.2022
Wiley
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which results in a huge semantic gap between visual and textual space. 2) hardly consider partial multi‐modal, where each instance is represented by only one modality. In reality, the pairing information for modalities is not available for all instances. To tackle the above issues, we propose a novel model called the Two‐Stage Partial Image‐Text Clustering (TPIT‐C) model. Firstly, we build an interpretable reasoning network to obtain the salient regions and semantic concepts of the scene in order to generate global semantic concepts. Secondly, we construct an adversarial learning module to align textual and visual instances into a unified space by virtue of cycle‐consistency. The experimental evaluations on public unpaired multi‐model datasets illustrated that the proposed method has better performance and the effectiveness of our algorithm in the partial image‐text clustering task.
ISSN:1751-9632
1751-9640
DOI:10.1049/cvi2.12117