Research on the Construction and Filter Method of Stop-word List in Text Preprocessing
In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and const...
Saved in:
Published in | 2011 International Conference on Intelligent Computation Technology and Automation Vol. 1; pp. 217 - 221 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2011
|
Subjects | |
Online Access | Get full text |
ISBN | 1612842895 9781612842899 |
DOI | 10.1109/ICICTA.2011.64 |
Cover
Abstract | In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest. |
---|---|
AbstractList | In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference of text documents' domain. Three different filter algorithms were designed and implemented in the process of the stop-word filter and their efficiency was compared emphatically. The experiment indicated that the hash-filter method was the fastest. |
Author | Zhou Yao Cao Ze-wen |
Author_xml | – sequence: 1 surname: Zhou Yao fullname: Zhou Yao email: zhou720yao@163.com organization: Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China – sequence: 2 surname: Cao Ze-wen fullname: Cao Ze-wen email: zwcao1016@hotmail.com organization: Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China |
BookMark | eNotjMFKAzEUAANa0NZevXjJD2zN22yym2NZrBYqii5eS7J5sZGalCSi_r0VncvAHGZKTkMMSMglsAUAU9frft0Py0XNABayOSFTkFB3Td0pMSHT36xq1srujMxzfmNHpFQtwDl5ecKMOo07GgMtO6R9DLmkj7H4Y9DB0pXfF0z0HssuWhodfS7xUH3GZOnG50J9oAN-FfqY8JDiiDn78HpBJk7vM87_PSPD6mbo76rNw-26X24qr1ipjDWitSMILWoHBkaU1klnBeskYqONcsbJ1nKNWHPDFdeAzDRd7Qy3EviMXP1tPSJuD8m_6_S9Fa1gQgn-Axq-U-g |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICICTA.2011.64 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EndPage | 221 |
ExternalDocumentID | 5750595 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIB RIC RIE RIL |
ID | FETCH-LOGICAL-i90t-bdb57dc15a52f1b1ce6df6fd5086ee4ab9fbf67d3aee23b393a1e0b482fb3d613 |
IEDL.DBID | RIE |
ISBN | 1612842895 9781612842899 |
IngestDate | Wed Aug 27 02:47:12 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
LCCN | 2011920768 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-bdb57dc15a52f1b1ce6df6fd5086ee4ab9fbf67d3aee23b393a1e0b482fb3d613 |
PageCount | 5 |
ParticipantIDs | ieee_primary_5750595 |
PublicationCentury | 2000 |
PublicationDate | 2011-March |
PublicationDateYYYYMMDD | 2011-03-01 |
PublicationDate_xml | – month: 03 year: 2011 text: 2011-March |
PublicationDecade | 2010 |
PublicationTitle | 2011 International Conference on Intelligent Computation Technology and Automation |
PublicationTitleAbbrev | icicta |
PublicationYear | 2011 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000669711 |
Score | 1.6047121 |
Snippet | In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 217 |
SubjectTerms | Algorithm design and analysis Filtering algorithms hash algorithm Indexes Information filters stop-word list stopword filter Switches Text mining text preprocessing |
Title | Research on the Construction and Filter Method of Stop-word List in Text Preprocessing |
URI | https://ieeexplore.ieee.org/document/5750595 |
Volume | 1 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT55UWvFNDh5N3Ww2u5ujFEsrVgRX6a0kmwkUYbeHLQV_vZN9VBEP3vYBIUxCvpnJzPcRchOBk1YCZ6C0YlEiNVMYiDB0FZI8kc6ltUzn4jmevUWPS7nskdt9LwwA1MVnMPaP9V2-LfOtT5Vh8I54rWSf9HGbNb1a-3wKQqdKOPe9W7E_czGQkC2lU_euWtJGHqi7-WQ-ye4bCk_PNvBDWqVGlukhWXRzagpKPsbbyozzz190jf-d9BEZfffw0Zc9Oh2THhRD8t4V2tGyoOj6Ua_X2THIUl1YOl3763O6qIWlaenoa1Vu2A5DVPqEO4KuC5rheY5Dw6bpMcDRRySbPmSTGWuVFdhaBRUz1sjE5lxqGTpueA6xdbGz6KzFAJE2yhkXJ1ZogFAYoYTmEJgoDZ0RFh2AEzIoygJOCYVUOC5TUIFNI47_bZwL4XIHSjlpwjMy9CZZbRrujFVrjfO_P1-QgyZn62u8LskADQBXCPqVua5X-wvEgKqW |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zHvSksom_zcGjmc3StM1RhmPTdQhW2W00zQsMod2hQ_Cv96U_pogHb2kKIbyWvO-9vO97hNz4YKWRwBmoVDE_lClTGIgwhAphFkpro6pNZzwPJq_-40IuOuR2y4UBgKr4DAZuWN3lmyLbuFQZBu_or5XcIbvo931Zs7W2GRV0nirk3LG3AnfqYighG1Gn9lk1so3cU3fT0XSU3Ncink5v4Edzlcq3jA9I3O6qLil5H2xKPcg-fwk2_nfbh6T_zeKjz1v_dEQ6kPfIW1tqR4ucIvijrmNnqyFL09zQ8cpdoNO4ai1NC0tfymLNPjBIpTP8J-gqpwme6Lg0rGuWAa7eJ8n4IRlNWNNbga2UVzJttAxNxmUqh5ZrnkFgbGANwrUAwE-1stoGoREpwFBooUTKwdN-NLRaGIQAx6SbFzmcEAqRsFxGoDwT-RzfmyATwmYWlLJSD09Jz5lkua7VM5aNNc7-nr4me5Mkni1n0_nTOdmvM7iu4uuCdNEYcIkQoNRX1Zf_AuSSreM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+International+Conference+on+Intelligent+Computation+Technology+and+Automation&rft.atitle=Research+on+the+Construction+and+Filter+Method+of+Stop-word+List+in+Text+Preprocessing&rft.au=Zhou+Yao&rft.au=Cao+Ze-wen&rft.date=2011-03-01&rft.pub=IEEE&rft.isbn=9781612842899&rft.volume=1&rft.spage=217&rft.epage=221&rft_id=info:doi/10.1109%2FICICTA.2011.64&rft.externalDocID=5750595 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842899/sc.gif&client=summon&freeimage=true |