Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and const...

Full description

Saved in:

Bibliographic Details
Published in	2011 International Conference on Intelligent Computation Technology and Automation Vol. 1; pp. 217 - 221
Main Authors	Zhou Yao, Cao Ze-wen
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2011
Subjects	Algorithm design and analysis Filtering algorithms hash algorithm Indexes Information filters stop-word list stopword filter Switches Text mining text preprocessing
Online Access	Get full text
ISBN	1612842895 9781612842899
DOI	10.1109/ICICTA.2011.64

Cover

Loading…

More Information
Summary:	In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest.
ISBN:	1612842895 9781612842899
DOI:	10.1109/ICICTA.2011.64