Building Words Dictionary List Using Symbol Enumeration and Hashing Methodology

This study aims to introduce a new method to reduce the time needed for text retrieval systems by building word dictionary takes the advantage of enumerating each string, multi hashing methodology stop-words extraction and word stemming; dictionary-based text mining has an important role in understa...

Full description

Saved in:
Bibliographic Details
Published inResearch Journal of Applied Sciences, Engineering and Technology Vol. 13; no. 12; pp. 885 - 894
Main Authors Safa S. Abdul-Jabbar, Loay E. George
Format Journal Article
LanguageEnglish
Published Maxwell Science Publishing 15.12.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study aims to introduce a new method to reduce the time needed for text retrieval systems by building word dictionary takes the advantage of enumerating each string, multi hashing methodology stop-words extraction and word stemming; dictionary-based text mining has an important role in understanding and analyzing large text datasets that used in any searching, matching and information retrieval systems. All of these systems mainly imply dealing with strings (i.e., undefined number of alphabet characters of each word and an undefined number of words in a sentence) and text processing operation. This has a significant effect on the execution time for the systems due to the overhead hidden-operations (like, symbols matching calculations and character conversion operations). Some of the attained experimental results are provided for these operations with a comparison between the proposed method results and those belong to the traditional method; which directly deals with strings only. Results comparisons are provided for each step to understand the advantage of the proposed approach. The results demonstrate the effectiveness of the proposed approach that reduces the execution time for each step, which in turn leads to improve the overall execution time for the whole system while maintaining the accuracy of the operations.
ISSN:2040-7467
2040-7459
2040-7467
DOI:10.19026/rjaset.13.3761