A feature selection algorithm with redundancy reduction for text classification
Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of sele...
Saved in:
Published in | 2007 22nd International Symposium on Computer and Information Sciences pp. 1 - 6 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2007
|
Subjects | |
Online Access | Get full text |
ISBN | 142441363X 9781424413638 |
DOI | 10.1109/ISCIS.2007.4456849 |
Cover
Abstract | Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of selected features and thus improves the classification accuracy. In this paper, a new algorithm for multi-label document classification is presented. This algorithm focuses on the reduction of redundant features using the concept of minimal redundancy maximal relevance which is based on the mutual information measure. The features selected by the proposed algorithm are then input to one of two classifiers, the multinomial naive Bayes classifier and the linear kernel support vector machines. The experimental results on the Reuters dataset show that the proposed algorithm is superior to some recent algorithms presented in the literature in many respects like the F 1 -measure and the break-even point. |
---|---|
AbstractList | Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of selected features and thus improves the classification accuracy. In this paper, a new algorithm for multi-label document classification is presented. This algorithm focuses on the reduction of redundant features using the concept of minimal redundancy maximal relevance which is based on the mutual information measure. The features selected by the proposed algorithm are then input to one of two classifiers, the multinomial naive Bayes classifier and the linear kernel support vector machines. The experimental results on the Reuters dataset show that the proposed algorithm is superior to some recent algorithms presented in the literature in many respects like the F 1 -measure and the break-even point. |
Author | El Sonbaty, Y. Saleh, S.N. |
Author_xml | – sequence: 1 givenname: S.N. surname: Saleh fullname: Saleh, S.N. organization: Arab Acad. for Sci., Alexandria – sequence: 2 givenname: Y. surname: El Sonbaty fullname: El Sonbaty, Y. |
BookMark | eNo1UMtOwzAQNAIkaMkPwMU_kGIn69exiqBEqtRDe-BWrZw1GKUJSlxB_55CyxzmsVrNYSbsqus7YuxeipmUwj3W66pezwohzAxAaQvugmXOWAkFgCw1qEs2-Q_l6w3LxvFDHAEKCqVu2WrOA2HaD8RHasmn2Hcc27d-iOl9x7-OzAdq9l2DnT_82dNP6Aee6Dtx3-I4xhA9_t7v2HXAdqTsrFO2eX7aVC_5crWoq_kyj06kHAmaxgOgM2TJKSQL0jlRaCNQ2xB06a0XAbWhklB4HaS2zgijtLDWlFP2cKqNRLT9HOIOh8P2PEH5A8nKUo0 |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ISCIS.2007.4456849 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781424413645 1424413648 |
EndPage | 6 |
ExternalDocumentID | 4456849 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI AAWTH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IERZE OCL RIE RIL |
ID | FETCH-LOGICAL-i90t-ae4ddc44a97e8e95ae8419902670a68ff63c8c0fa67e3ea0c6f16897075608873 |
IEDL.DBID | RIE |
ISBN | 142441363X 9781424413638 |
IngestDate | Wed Aug 27 01:54:53 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-ae4ddc44a97e8e95ae8419902670a68ff63c8c0fa67e3ea0c6f16897075608873 |
PageCount | 6 |
ParticipantIDs | ieee_primary_4456849 |
PublicationCentury | 2000 |
PublicationDate | 2007-Nov. |
PublicationDateYYYYMMDD | 2007-11-01 |
PublicationDate_xml | – month: 11 year: 2007 text: 2007-Nov. |
PublicationDecade | 2000 |
PublicationTitle | 2007 22nd International Symposium on Computer and Information Sciences |
PublicationTitleAbbrev | ISCIS |
PublicationYear | 2007 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000454255 |
Score | 1.4338193 |
Snippet | Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Educational institutions Frequency Gain measurement Kernel Mutual information Performance evaluation Support vector machine classification Support vector machines Testing Text categorization |
Title | A feature selection algorithm with redundancy reduction for text classification |
URI | https://ieeexplore.ieee.org/document/4456849 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkyAWsRbHhhJ6jSOY4-oomqReEgtUrfKjzNUQItCuvDrsZ2kCMTAZkeRZflif5fzfd8hdOGcZMvdD06kgKiI2oHbUpkikU3kAISx_m7GZ1vcsfEjvZln8xa63HJhACAkn0Hsm-Eu36z1xofK-tShPaeijdruM6u4Wtt4ipeSc-5xw91KUpbOG0mnus8b0gwR_cl0OJlWCob1qD_KqwR0Ge2i22ZeVVLJS7wpVaw_f0k2_nfie6j3zePDD1uE2kctWHXR_RW2EOQ88UcoguMsg-Xr07pYls9v2AdmcQGeXOYP3tCs3nHuLfZ5Ilh7j9unGAWr9tBsdD0bjqO6rEK0FKSMJFBjNKVS5MBBZBI4TRwmDVhOJOPWslRzTaxkOaQgiWY2YVzkzrdg_khKD1BntV7BIcLai40ZohQzblBLJZDcamKyXECqjDlCXb8Wi_dKOGNRL8Px349P0E4InAai3ynqlMUGzhzil-o8mPoLWxupJA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwIxEG0UD3pSA8Zve_DoQpftdtujIRJQQBMw4Ub6MVWigsHl4q-37S4YjQdv3WbTNJ20bzqd9wahS-ckW-4uOJECoiJqm25LpYpENpZNEMb6txmfbTFgnUd6O07HG-hqzYUBgJB8BnXfDG_5Zq6XPlTWoA7tORWbaMvhPk0LttY6ouLF5JyDvGJvxQlLxitRp_Kbr2gzRDS6w1Z3WGgYluP-KLAS8KW9i_qrmRVpJS_1Za7q-vOXaON_p76Hat9MPvywxqh9tAGzKrq_xhaCoCf-CGVwnG2wfH2aL6b58xv2oVm8AE8v80dvaBb_OAcX-0wRrL3P7ZOMgl1raNS-GbU6UVlYIZoKkkcSqDGaUiky4CBSCZzGDpWaLCOScWtZorkmVrIMEpBEMxszLjLnXTB_KCUHqDKbz-AQYe3lxgxRihk3qKUSSGY1MWkmIFHGHKGqX4vJeyGdMSmX4fjv7gu03Rn1e5Ned3B3gnZCGDXQ_k5RJV8s4czhf67Og9m_AGiurHE |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2007+22nd+International+Symposium+on+Computer+and+Information+Sciences&rft.atitle=A+feature+selection+algorithm+with+redundancy+reduction+for+text+classification&rft.au=Saleh%2C+S.N.&rft.au=El+Sonbaty%2C+Y.&rft.date=2007-11-01&rft.pub=IEEE&rft.isbn=9781424413638&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISCIS.2007.4456849&rft.externalDocID=4456849 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/sc.gif&client=summon&freeimage=true |