An Improved Text Categorization Algorithm Based on VSM
With the advent of the information age, various kinds of information have been spread on the Internet. The amount of junk information affects people's lives seriously. In order to filter the harmful Web pages efficiently and effectively, we have suggested a novel text classification algorithm b...
Saved in:
Published in | 2014 IEEE 17th International Conference on Computational Science and Engineering pp. 1701 - 1706 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With the advent of the information age, various kinds of information have been spread on the Internet. The amount of junk information affects people's lives seriously. In order to filter the harmful Web pages efficiently and effectively, we have suggested a novel text classification algorithm based on Vector Space Model in this paper. This algorithm has adopted the modularized processing mode to deal with Web pages. In addition, it has introduced the proportion of feature selection and improved the traditional Term Frequency-Inverse Document Frequency weighting method. Furthermore, the simulation of our algorithm and other existing work has been given. The comparison shows that our algorithm enjoying higher accuracy and classification precision, which achieves a better system performance and a better classifying effect. |
---|---|
DOI: | 10.1109/CSE.2014.313 |