Web-Based Document Classification Using a Trie-Based Index Structure

An automatic document classification system is useful to manage the massive quantities of documents such as the Web document collection. However, its complicated process of classification has become a serious problem when applying it to general services. In this paper, we suggest an efficient data s...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops pp. 52 - 55
Main Authors	Park, Jeahyun, Park, Juyoung, Choi, Joongmin
Format	Conference Proceeding
Language	English
Published	Washington, DC, USA IEEE Computer Society 02.11.2007
Series	ACM Conferences
Subjects	Applied computing > Document management and text processing Computing methodologies > Machine learning > Learning paradigms > Supervised learning > Supervised learning by classification Computing methodologies > Machine learning > Machine learning approaches Computing methodologies > Machine learning > Machine learning approaches > Classification and regression trees Information systems > Information retrieval Information systems > Information storage systems document classificationWeb-based classification interfacetrie index structure
Online Access	Get full text

Cover

Loading…

More Information
Summary:	An automatic document classification system is useful to manage the massive quantities of documents such as the Web document collection. However, its complicated process of classification has become a serious problem when applying it to general services. In this paper, we suggest an efficient data structure for the document classification and develop a classification system based on a trie-based index structure. This efficient data structure reduces overheads for the task of document classification using naive Bayesian probabilistic models and makes it possible to implement commercial applications. In our system, both learning and classification are performed in a Web-based user interface rather than by a remote application, which contributes to achieve easy control of the classification process and the flexibility of diverse document provision.
ISBN:	0769530281 9780769530284
DOI:	10.5555/1339264.1339657