Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering

The usage of convolutional neural networks (CNNs) for unsupervised image segmentation was investigated in this study. Similar to supervised image segmentation, the proposed CNN assigns labels to pixels that denote the cluster to which the pixel belongs. In unsupervised image segmentation, however, n...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 29; pp. 8055 - 8068
Main Authors Kim, Wonjik, Kanezaki, Asako, Tanaka, Masayuki
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The usage of convolutional neural networks (CNNs) for unsupervised image segmentation was investigated in this study. Similar to supervised image segmentation, the proposed CNN assigns labels to pixels that denote the cluster to which the pixel belongs. In unsupervised image segmentation, however, no training images or ground truth labels of pixels are specified beforehand. Therefore, once a target image is input, the pixel labels and feature representations are jointly optimized, and their parameters are updated by the gradient descent. In the proposed approach, label prediction and network parameter learning are alternately iterated to meet the following criteria: (a) pixels of similar features should be assigned the same label, (b) spatially continuous pixels should be assigned the same label, and (c) the number of unique labels should be large. Although these criteria are incompatible, the proposed approach minimizes the combination of similarity loss and spatial continuity loss to find a plausible solution of label assignment that balances the aforementioned criteria well. The contributions of this study are four-fold. First, we propose a novel end-to-end network of unsupervised image segmentation that consists of normalization and an argmax function for differentiable clustering. Second, we introduce a spatial continuity loss function that mitigates the limitations of fixed segment boundaries possessed by previous work. Third, we present an extension of the proposed method for segmentation with scribbles as user input, which showed better accuracy than existing methods while maintaining efficiency. Finally, we introduce another extension of the proposed method: unseen image segmentation by using networks pre-trained with a few reference images without re-training the networks. The effectiveness of the proposed approach was examined on several benchmark datasets of image segmentation.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2020.3011269