A Combined Approach for Filter Feature Selection in Document Classification

For a large set of documents, bag-of-words vector can reach thousands of features. Document classification faces many difficulties in high dimensionality of bag-of-words vector. High dimensionality not only increases computation cost but also degrades the accuracy of classification process. The aim...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) pp. 317 - 324
Main Authors Le Nguyen, Hoai Nam, Ho Bao, Quoc
Format Conference Proceeding Journal Article
LanguageEnglish
Published IEEE 01.11.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:For a large set of documents, bag-of-words vector can reach thousands of features. Document classification faces many difficulties in high dimensionality of bag-of-words vector. High dimensionality not only increases computation cost but also degrades the accuracy of classification process. The aim of filter feature selection is to remove irrelevant features by selecting a subset of the original feature set. In this paper, we analyze two filter feature selection approaches which are the frequency-based approach and the cluster-based approach. We propose a hybrid filter Feature Selection method for the combination of these approaches, named FCFS, in order to exploit their strong points. We experiment on FCFS and related filter feature selection methods as CMFS, OCFS, CIIC, IG, CHI with two datasets about news and medicine. Regarding Macro-F1, FCFS is superior to the other methods, while FCFS shows comparable and even better performance than the other methods in term of Micro-F1
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
ISSN:1082-3409
2375-0197
DOI:10.1109/ICTAI.2015.56