Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet
A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document...
Saved in:
Published in | Journal of Information and Communication Convergence Engineering, 11(4) Vol. 11; no. 4; pp. 241 - 246 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
한국정보통신학회
31.12.2013
|
Subjects | |
Online Access | Get full text |
ISSN | 2234-8255 2234-8883 |
DOI | 10.6109/jicce.2013.11.4.241 |
Cover
Loading…
Summary: | A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods. KCI Citation Count: 0 |
---|---|
Bibliography: | G704-SER000003196.2013.11.4.007 |
ISSN: | 2234-8255 2234-8883 |
DOI: | 10.6109/jicce.2013.11.4.241 |