Document Clustering Using Semantic Features and Fuzzy Relations

Traditional clustering methods are usually based on the bag-of-words (BOW) model. A disadvantage of the BOW model is that it ignores the semantic relationship among terms in the data set. To resolve this problem, ontology or matrix factorization approaches are usually used. However, a major problem...

Full description

Saved in:
Bibliographic Details
Published inJournal of Information and Communication Convergence Engineering, 11(3) Vol. 11; no. 3; pp. 179 - 184
Main Authors Kim, Chul-Won, Park, Sun
Format Journal Article
LanguageEnglish
Published 한국정보통신학회 30.09.2013
Subjects
Online AccessGet full text
ISSN2234-8255
2234-8883
DOI10.6109/jicce.2013.11.3.179

Cover

Loading…
More Information
Summary:Traditional clustering methods are usually based on the bag-of-words (BOW) model. A disadvantage of the BOW model is that it ignores the semantic relationship among terms in the data set. To resolve this problem, ontology or matrix factorization approaches are usually used. However, a major problem of the ontology approach is that it is usually difficult to find a comprehensive ontology that can cover all the concepts mentioned in a collection. This paper proposes a new document clustering method using semantic features and fuzzy relations for solving the problems of ontology and matrix factorization approaches. The proposed method can improve the quality of document clustering because the clustered documents use fuzzy relation values between semantic features and terms to distinguish clearly among dissimilar documents in clusters. The selected cluster label terms can represent the inherent structure of a document set better by using semantic features based on non-negative matrix factorization, which is used in document clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods. KCI Citation Count: 0
Bibliography:G704-SER000003196.2013.11.3.001
ISSN:2234-8255
2234-8883
DOI:10.6109/jicce.2013.11.3.179