A Combined Approach for Filter Feature Selection in Document Classification
For a large set of documents, bag-of-words vector can reach thousands of features. Document classification faces many difficulties in high dimensionality of bag-of-words vector. High dimensionality not only increases computation cost but also degrades the accuracy of classification process. The aim...
Saved in:
Published in | 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) pp. 317 - 324 |
---|---|
Main Authors | , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
IEEE
01.11.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | For a large set of documents, bag-of-words vector can reach thousands of features. Document classification faces many difficulties in high dimensionality of bag-of-words vector. High dimensionality not only increases computation cost but also degrades the accuracy of classification process. The aim of filter feature selection is to remove irrelevant features by selecting a subset of the original feature set. In this paper, we analyze two filter feature selection approaches which are the frequency-based approach and the cluster-based approach. We propose a hybrid filter Feature Selection method for the combination of these approaches, named FCFS, in order to exploit their strong points. We experiment on FCFS and related filter feature selection methods as CMFS, OCFS, CIIC, IG, CHI with two datasets about news and medicine. Regarding Macro-F1, FCFS is superior to the other methods, while FCFS shows comparable and even better performance than the other methods in term of Micro-F1 |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2 |
ISSN: | 1082-3409 2375-0197 |
DOI: | 10.1109/ICTAI.2015.56 |