Two‐stage partial image‐text clustering (TPIT‐C)
Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which resu...
Saved in:
Published in | IET computer vision Vol. 16; no. 8; pp. 694 - 708 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Stevenage
John Wiley & Sons, Inc
01.12.2022
Wiley |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which results in a huge semantic gap between visual and textual space. 2) hardly consider partial multi‐modal, where each instance is represented by only one modality. In reality, the pairing information for modalities is not available for all instances. To tackle the above issues, we propose a novel model called the Two‐Stage Partial Image‐Text Clustering (TPIT‐C) model. Firstly, we build an interpretable reasoning network to obtain the salient regions and semantic concepts of the scene in order to generate global semantic concepts. Secondly, we construct an adversarial learning module to align textual and visual instances into a unified space by virtue of cycle‐consistency. The experimental evaluations on public unpaired multi‐model datasets illustrated that the proposed method has better performance and the effectiveness of our algorithm in the partial image‐text clustering task. |
---|---|
ISSN: | 1751-9632 1751-9640 |
DOI: | 10.1049/cvi2.12117 |