Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and const...

Full description

Saved in:
Bibliographic Details
Published in2011 International Conference on Intelligent Computation Technology and Automation Vol. 1; pp. 217 - 221
Main Authors Zhou Yao, Cao Ze-wen
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2011
Subjects
Online AccessGet full text
ISBN1612842895
9781612842899
DOI10.1109/ICICTA.2011.64

Cover

Abstract In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest.
AbstractList In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest.
Author Zhou Yao
Cao Ze-wen
Author_xml – sequence: 1
  surname: Zhou Yao
  fullname: Zhou Yao
  email: zhou720yao@163.com
  organization: Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
– sequence: 2
  surname: Cao Ze-wen
  fullname: Cao Ze-wen
  email: zwcao1016@hotmail.com
  organization: Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
BookMark eNotjMFKAzEUAANa0NZevXjJD2zN22yym2NZrBYqii5eS7J5sZGalCSi_r0VncvAHGZKTkMMSMglsAUAU9frft0Py0XNABayOSFTkFB3Td0pMSHT36xq1srujMxzfmNHpFQtwDl5ecKMOo07GgMtO6R9DLmkj7H4Y9DB0pXfF0z0HssuWhodfS7xUH3GZOnG50J9oAN-FfqY8JDiiDn78HpBJk7vM87_PSPD6mbo76rNw-26X24qr1ipjDWitSMILWoHBkaU1klnBeskYqONcsbJ1nKNWHPDFdeAzDRd7Qy3EviMXP1tPSJuD8m_6_S9Fa1gQgn-Axq-U-g
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICICTA.2011.64
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EndPage 221
ExternalDocumentID 5750595
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i90t-bdb57dc15a52f1b1ce6df6fd5086ee4ab9fbf67d3aee23b393a1e0b482fb3d613
IEDL.DBID RIE
ISBN 1612842895
9781612842899
IngestDate Wed Aug 27 02:47:12 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCN 2011920768
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-bdb57dc15a52f1b1ce6df6fd5086ee4ab9fbf67d3aee23b393a1e0b482fb3d613
PageCount 5
ParticipantIDs ieee_primary_5750595
PublicationCentury 2000
PublicationDate 2011-March
PublicationDateYYYYMMDD 2011-03-01
PublicationDate_xml – month: 03
  year: 2011
  text: 2011-March
PublicationDecade 2010
PublicationTitle 2011 International Conference on Intelligent Computation Technology and Automation
PublicationTitleAbbrev icicta
PublicationYear 2011
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000669711
Score 1.6047121
Snippet In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the...
SourceID ieee
SourceType Publisher
StartPage 217
SubjectTerms Algorithm design and analysis
Filtering algorithms
hash algorithm
Indexes
Information filters
stop-word list
stopword filter
Switches
Text mining
text preprocessing
Title Research on the Construction and Filter Method of Stop-word List in Text Preprocessing
URI https://ieeexplore.ieee.org/document/5750595
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT55UWvFNDh5N3Ww2u5ujFEsrVgRX6a0kmwkUYbeHLQV_vZN9VBEP3vYBIUxCvpnJzPcRchOBk1YCZ6C0YlEiNVMYiDB0FZI8kc6ltUzn4jmevUWPS7nskdt9LwwA1MVnMPaP9V2-LfOtT5Vh8I54rWSf9HGbNb1a-3wKQqdKOPe9W7E_czGQkC2lU_euWtJGHqi7-WQ-ye4bCk_PNvBDWqVGlukhWXRzagpKPsbbyozzz190jf-d9BEZfffw0Zc9Oh2THhRD8t4V2tGyoOj6Ua_X2THIUl1YOl3763O6qIWlaenoa1Vu2A5DVPqEO4KuC5rheY5Dw6bpMcDRRySbPmSTGWuVFdhaBRUz1sjE5lxqGTpueA6xdbGz6KzFAJE2yhkXJ1ZogFAYoYTmEJgoDZ0RFh2AEzIoygJOCYVUOC5TUIFNI47_bZwL4XIHSjlpwjMy9CZZbRrujFVrjfO_P1-QgyZn62u8LskADQBXCPqVua5X-wvEgKqW
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zHvSksom_zcGjmc3StM1RhmPTdQhW2W00zQsMod2hQ_Cv96U_pogHb2kKIbyWvO-9vO97hNz4YKWRwBmoVDE_lClTGIgwhAphFkpro6pNZzwPJq_-40IuOuR2y4UBgKr4DAZuWN3lmyLbuFQZBu_or5XcIbvo931Zs7W2GRV0nirk3LG3AnfqYighG1Gn9lk1so3cU3fT0XSU3Ncink5v4Edzlcq3jA9I3O6qLil5H2xKPcg-fwk2_nfbh6T_zeKjz1v_dEQ6kPfIW1tqR4ucIvijrmNnqyFL09zQ8cpdoNO4ai1NC0tfymLNPjBIpTP8J-gqpwme6Lg0rGuWAa7eJ8n4IRlNWNNbga2UVzJttAxNxmUqh5ZrnkFgbGANwrUAwE-1stoGoREpwFBooUTKwdN-NLRaGIQAx6SbFzmcEAqRsFxGoDwT-RzfmyATwmYWlLJSD09Jz5lkua7VM5aNNc7-nr4me5Mkni1n0_nTOdmvM7iu4uuCdNEYcIkQoNRX1Zf_AuSSreM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+International+Conference+on+Intelligent+Computation+Technology+and+Automation&rft.atitle=Research+on+the+Construction+and+Filter+Method+of+Stop-word+List+in+Text+Preprocessing&rft.au=Zhou+Yao&rft.au=Cao+Ze-wen&rft.date=2011-03-01&rft.pub=IEEE&rft.isbn=9781612842899&rft.volume=1&rft.spage=217&rft.epage=221&rft_id=info:doi/10.1109%2FICICTA.2011.64&rft.externalDocID=5750595
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/sc.gif&client=summon&freeimage=true