Document Categorization with Entropy Based TF/IDF Classifier

The task of text categorization is assigning a given text document to one or more predefined categories. High availability of digital data requires methods for automatic processing of this data. Day-by day increase of this digital data gives rise to the need of fast and better text classifiers. This...

Full description

Saved in:

Bibliographic Details
Published in	2009 WRI Global Congress on Intelligent Systems Vol. 4; pp. 269 - 273
Main Authors	Yi-hong Lu, Yan Huang
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2009
Subjects	Availability Clustering algorithms Computer science Educational institutions Entropy Information filtering Information filters Intelligent systems Mutual gain information Mutual information Text categorization TFIDF
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The task of text categorization is assigning a given text document to one or more predefined categories. High availability of digital data requires methods for automatic processing of this data. Day-by day increase of this digital data gives rise to the need of fast and better text classifiers. This paper mainly focuses on classifying data in context of text categorization. This paper reports a study conducted on 20 news group dataset, using TFIDF in the context of document categorization. Feature selection is added to this result to improvise the categorization. The results achieved using this algorithm are very promising when compared to conventional methods with features chosen on the basis of bag-of-words text.
ISBN:	9780769535715 0769535712
ISSN:	2155-6083 2155-6091
DOI:	10.1109/GCIS.2009.311