Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document...

Full description

Saved in:
Bibliographic Details
Published inJournal of Information and Communication Convergence Engineering, 11(4) Vol. 11; no. 4; pp. 241 - 246
Main Authors Kim, Chul-Won, Park, Sun
Format Journal Article
LanguageEnglish
Published 한국정보통신학회 31.12.2013
Subjects
Online AccessGet full text
ISSN2234-8255
2234-8883
DOI10.6109/jicce.2013.11.4.241

Cover

Loading…
More Information
Summary:A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods. KCI Citation Count: 0
Bibliography:G704-SER000003196.2013.11.4.007
ISSN:2234-8255
2234-8883
DOI:10.6109/jicce.2013.11.4.241