A new feature selection method to improve the document clustering using particle swarm optimization algorithm

•Adapt PSO algorithm for the text feature selection problem.•A new feature selection method is established using the TF-IDF weight scheme.•K-mean text clustering is used based on the features obtained. The large amount of text information on the Internet and in modern applications makes dealing with...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computational science Vol. 25; pp. 456 - 466
Main Authors	Abualigah, Laith Mohammad, Khader, Ahamad Tajudin, Hanandeh, Essam Said
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2018
Subjects	Informative features K-mean text clustering algorithm Particle swarm optimization algorithm Unsupervised feature selection 99-00 Informative features Unsupervised feature selection 00-01 K-mean text clustering algorithm Particle swarm optimization algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Adapt PSO algorithm for the text feature selection problem.•A new feature selection method is established using the TF-IDF weight scheme.•K-mean text clustering is used based on the features obtained. The large amount of text information on the Internet and in modern applications makes dealing with this volume of information complicated. The text clustering technique is an appropriate tool to deal with an enormous amount of text documents by grouping these documents into coherent groups. The document size decreases the effectiveness of the text clustering technique. Subsequently, text documents contain sparse and uninformative features (i.e., noisy, irrelevant, and unnecessary features), which affect the effectiveness of the text clustering technique. The feature selection technique is a primary unsupervised learning method employed to select the informative text features to create a new subset of a document's features. This method is used to increase the effectiveness of the underlying clustering algorithm. Recently, several complex optimization problems have been successfully solved using metaheuristic algorithms. This paper proposes a novel feature selection method, namely, feature selection method using the particle swarm optimization (PSO) algorithm (FSPSOTC) to solve the feature selection problem by creating a new subset of informative text features. This new subset of features can improve the performance of the text clustering technique and reduce the computational time. Experiments were conducted using six standard text datasets with several characteristics. These datasets are commonly used in the domain of the text clustering. The results revealed that the proposed method (FSPSOTC) enhanced the effectiveness of the text clustering technique by dealing with a new subset of informative features. The proposed method is compared with the other well-known algorithms i.e., feature selection method using a genetic algorithm to improve the text clustering (FSGATC), and feature selection method using the harmony search algorithm to improve the text clustering (FSHSTC) in the text feature selection.
ISSN:	1877-7503 1877-7511
DOI:	10.1016/j.jocs.2017.07.018