Two‐stage partial image‐text clustering (TPIT‐C)

Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which resu...

Full description

Saved in:

Bibliographic Details
Published in	IET computer vision Vol. 16; no. 8; pp. 694 - 708
Main Authors	Guo, Dongjin, Su, Xiaoming, Lian, Yahong, Liu, Limin, Wang, Haibo
Format	Journal Article
Language	English
Published	Stevenage John Wiley & Sons, Inc 01.12.2022 Wiley
Subjects	Algorithms Cluster analysis Clustering Data analysis Effectiveness Machine learning Methods Missing data Neural networks Representations Semantics Visual tasks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep multi‐model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which results in a huge semantic gap between visual and textual space. 2) hardly consider partial multi‐modal, where each instance is represented by only one modality. In reality, the pairing information for modalities is not available for all instances. To tackle the above issues, we propose a novel model called the Two‐Stage Partial Image‐Text Clustering (TPIT‐C) model. Firstly, we build an interpretable reasoning network to obtain the salient regions and semantic concepts of the scene in order to generate global semantic concepts. Secondly, we construct an adversarial learning module to align textual and visual instances into a unified space by virtue of cycle‐consistency. The experimental evaluations on public unpaired multi‐model datasets illustrated that the proposed method has better performance and the effectiveness of our algorithm in the partial image‐text clustering task.
ISSN:	1751-9632 1751-9640
DOI:	10.1049/cvi2.12117