Feature Selection for Classification using Principal Component Analysis and Information Gain
•Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature selection model that is not specific to any data set is widely applied. Feature Selection and classification have previously been widely applied...
Saved in:
Published in | Expert systems with applications Vol. 174; p. 114765 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Elsevier Ltd
15.07.2021
Elsevier BV |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature selection model that is not specific to any data set is widely applied.
Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall.. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2021.114765 |