A feature selection algorithm with redundancy reduction for text classification

Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of sele...

Full description

Saved in:
Bibliographic Details
Published in2007 22nd International Symposium on Computer and Information Sciences pp. 1 - 6
Main Authors Saleh, S.N., El Sonbaty, Y.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2007
Subjects
Online AccessGet full text
ISBN142441363X
9781424413638
DOI10.1109/ISCIS.2007.4456849

Cover

Abstract Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of selected features and thus improves the classification accuracy. In this paper, a new algorithm for multi-label document classification is presented. This algorithm focuses on the reduction of redundant features using the concept of minimal redundancy maximal relevance which is based on the mutual information measure. The features selected by the proposed algorithm are then input to one of two classifiers, the multinomial naive Bayes classifier and the linear kernel support vector machines. The experimental results on the Reuters dataset show that the proposed algorithm is superior to some recent algorithms presented in the literature in many respects like the F 1 -measure and the break-even point.
AbstractList Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document classification is the large dimensionality of the data. To overcome this problem, feature selection is required which reduces the number of selected features and thus improves the classification accuracy. In this paper, a new algorithm for multi-label document classification is presented. This algorithm focuses on the reduction of redundant features using the concept of minimal redundancy maximal relevance which is based on the mutual information measure. The features selected by the proposed algorithm are then input to one of two classifiers, the multinomial naive Bayes classifier and the linear kernel support vector machines. The experimental results on the Reuters dataset show that the proposed algorithm is superior to some recent algorithms presented in the literature in many respects like the F 1 -measure and the break-even point.
Author El Sonbaty, Y.
Saleh, S.N.
Author_xml – sequence: 1
  givenname: S.N.
  surname: Saleh
  fullname: Saleh, S.N.
  organization: Arab Acad. for Sci., Alexandria
– sequence: 2
  givenname: Y.
  surname: El Sonbaty
  fullname: El Sonbaty, Y.
BookMark eNo1UMtOwzAQNAIkaMkPwMU_kGIn69exiqBEqtRDe-BWrZw1GKUJSlxB_55CyxzmsVrNYSbsqus7YuxeipmUwj3W66pezwohzAxAaQvugmXOWAkFgCw1qEs2-Q_l6w3LxvFDHAEKCqVu2WrOA2HaD8RHasmn2Hcc27d-iOl9x7-OzAdq9l2DnT_82dNP6Aee6Dtx3-I4xhA9_t7v2HXAdqTsrFO2eX7aVC_5crWoq_kyj06kHAmaxgOgM2TJKSQL0jlRaCNQ2xB06a0XAbWhklB4HaS2zgijtLDWlFP2cKqNRLT9HOIOh8P2PEH5A8nKUo0
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISCIS.2007.4456849
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781424413645
1424413648
EndPage 6
ExternalDocumentID 4456849
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
AAWTH
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-ae4ddc44a97e8e95ae8419902670a68ff63c8c0fa67e3ea0c6f16897075608873
IEDL.DBID RIE
ISBN 142441363X
9781424413638
IngestDate Wed Aug 27 01:54:53 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-ae4ddc44a97e8e95ae8419902670a68ff63c8c0fa67e3ea0c6f16897075608873
PageCount 6
ParticipantIDs ieee_primary_4456849
PublicationCentury 2000
PublicationDate 2007-Nov.
PublicationDateYYYYMMDD 2007-11-01
PublicationDate_xml – month: 11
  year: 2007
  text: 2007-Nov.
PublicationDecade 2000
PublicationTitle 2007 22nd International Symposium on Computer and Information Sciences
PublicationTitleAbbrev ISCIS
PublicationYear 2007
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000454255
Score 1.4338193
Snippet Document classification involves the act of classifying documents according to their content to predefined categories. One of the main problems of document...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Educational institutions
Frequency
Gain measurement
Kernel
Mutual information
Performance evaluation
Support vector machine classification
Support vector machines
Testing
Text categorization
Title A feature selection algorithm with redundancy reduction for text classification
URI https://ieeexplore.ieee.org/document/4456849
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkyAWsRbHhhJ6jSOY4-oomqReEgtUrfKjzNUQItCuvDrsZ2kCMTAZkeRZflif5fzfd8hdOGcZMvdD06kgKiI2oHbUpkikU3kAISx_m7GZ1vcsfEjvZln8xa63HJhACAkn0Hsm-Eu36z1xofK-tShPaeijdruM6u4Wtt4ipeSc-5xw91KUpbOG0mnus8b0gwR_cl0OJlWCob1qD_KqwR0Ge2i22ZeVVLJS7wpVaw_f0k2_nfie6j3zePDD1uE2kctWHXR_RW2EOQ88UcoguMsg-Xr07pYls9v2AdmcQGeXOYP3tCs3nHuLfZ5Ilh7j9unGAWr9tBsdD0bjqO6rEK0FKSMJFBjNKVS5MBBZBI4TRwmDVhOJOPWslRzTaxkOaQgiWY2YVzkzrdg_khKD1BntV7BIcLai40ZohQzblBLJZDcamKyXECqjDlCXb8Wi_dKOGNRL8Px349P0E4InAai3ynqlMUGzhzil-o8mPoLWxupJA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwIxEG0UD3pSA8Zve_DoQpftdtujIRJQQBMw4Ub6MVWigsHl4q-37S4YjQdv3WbTNJ20bzqd9wahS-ckW-4uOJECoiJqm25LpYpENpZNEMb6txmfbTFgnUd6O07HG-hqzYUBgJB8BnXfDG_5Zq6XPlTWoA7tORWbaMvhPk0LttY6ouLF5JyDvGJvxQlLxitRp_Kbr2gzRDS6w1Z3WGgYluP-KLAS8KW9i_qrmRVpJS_1Za7q-vOXaON_p76Hat9MPvywxqh9tAGzKrq_xhaCoCf-CGVwnG2wfH2aL6b58xv2oVm8AE8v80dvaBb_OAcX-0wRrL3P7ZOMgl1raNS-GbU6UVlYIZoKkkcSqDGaUiky4CBSCZzGDpWaLCOScWtZorkmVrIMEpBEMxszLjLnXTB_KCUHqDKbz-AQYe3lxgxRihk3qKUSSGY1MWkmIFHGHKGqX4vJeyGdMSmX4fjv7gu03Rn1e5Ned3B3gnZCGDXQ_k5RJV8s4czhf67Og9m_AGiurHE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2007+22nd+International+Symposium+on+Computer+and+Information+Sciences&rft.atitle=A+feature+selection+algorithm+with+redundancy+reduction+for+text+classification&rft.au=Saleh%2C+S.N.&rft.au=El+Sonbaty%2C+Y.&rft.date=2007-11-01&rft.pub=IEEE&rft.isbn=9781424413638&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISCIS.2007.4456849&rft.externalDocID=4456849
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424413638/sc.gif&client=summon&freeimage=true