Document clustering analysis with aid of adaptive Jaro Winkler with Jellyfish search clustering algorithm
•For the document clustering, Adaptive Jaro Winkler with Jellyfish Search Clustering algorithm is developed.•Four phases are considered namely pre-processing, feature extraction, feature knowledge establishment, and document clustering phase.•First phase, documents are pre-processed based on tokeniz...
Saved in:
Published in | Advances in engineering software (1992) Vol. 175; p. 103322 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •For the document clustering, Adaptive Jaro Winkler with Jellyfish Search Clustering algorithm is developed.•Four phases are considered namely pre-processing, feature extraction, feature knowledge establishment, and document clustering phase.•First phase, documents are pre-processed based on tokenization, stop-word removal and stemming techniques.•Relative Document-Term Frequency Difference (RD-TFD) and wordnet ontology features are used for Feature extraction.•For the Feature selection process, Chimp Optimization Algorithm (COA) is used.•Document clustering is carried out using the proposed Adaptive Jaro Winkler with Jellyfish Search Clustering (AJWJSC) algorithm.•Tested with three datasets, namely Reuter database, 20 Newsgroups and TDT2 database.•statistical measures like precision, Recall, F-measure, and accuracy are determined.•The proposed method is implemented in MATLAB platform and compared with the k-means Clustering, Krill herd (KH) algorithm, and Moth Flame Optimization (MFO) Algorithm respectively.
In this research, document clustering is analyzed with the help of Adaptive Jaro Winkler with Jellyfish Search Clustering (AJWJSC) algorithm and Chimp Optimization Algorithm (COA). The major motive of the research is to compute the relevant topics with the easiest way in addition reduce the complexity of the domain analysis. The document retrieval process is analysis for the recent topic detection and identification. In the research, four stages are considered to analyze the documents, named as pre-processing, feature extraction, feature knowledge establishment, in addition document clustering phase. Initially, the documents can be pre-processed with the consideration on tokenization, stop-word removal and stemming methods. After that, the Relative Document-Term Frequency Difference (RD-TFD) technique can be utilized to extract the features. Based on the extracted feature set, the essential features are nominated with the help of Chimp Optimization Algorithm (COA). Afterwards, the document clustering process is computed using the Adaptive Jaro Winkler with Jellyfish Search Clustering (AJWJSC) algorithm. The novelty of the work is to document clustering from the documents for utilized in different applications. The proposed method is designed in the MATLAB platform and analyzed with the three datasets named as Reuter database, 20 Newsgroups and Topic detection and tracking (TDT2) database respectively. While analyzing the statistical measures of the research, various parameters is determined like precision, Recall, F-measure, accuracy, and the efficiency. Here, the clustered documents are validated based on their similarity and this is used for the document retrieval purpose. The projected technique can be contrasted with the k-means Clustering, Krill herd (KH) algorithm, and Moth Flame Optimization (MFO) Algorithm respectively. |
---|---|
ISSN: | 0965-9978 |
DOI: | 10.1016/j.advengsoft.2022.103322 |