An Improved Co-Similarity Measure for Document Clustering

Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work, we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between obje...

Full description

Saved in:

Bibliographic Details
Published in	2010 International Conference on Machine Learning and Applications pp. 190 - 197
Main Authors	Hussain, S F, Bisson, G, Grimal, C
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2010
Subjects	Clustering algorithms co-clustering Complexity theory Equations Oceans Sea measurements Semantics similarity measure Strontium text mining
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work, we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.
ISBN:	1424492114 9781424492114
DOI:	10.1109/ICMLA.2010.35