Topic Distillation and Clustering Algorithm Based on the Topology of Pages-Keywords

Hits algorithm has gotten great success and been applied in the analysis of Web linking. Hits algorithm is used to search the authority pages and the hub pages from the results of the search engine, and it can also be used to search the Web communities. But Hits algorithm is based on the hyperlinks...

Full description

Saved in:
Bibliographic Details
Published in2006 International Conference on Machine Learning and Cybernetics pp. 1581 - 1586
Main Authors Jian-Shuang Deng, Qi-Lun Zheng, Hong Peng
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Hits algorithm has gotten great success and been applied in the analysis of Web linking. Hits algorithm is used to search the authority pages and the hub pages from the results of the search engine, and it can also be used to search the Web communities. But Hits algorithm is based on the hyperlinks of the pages, it is easy to bring the problem of topic excursion. Hits algorithm requires a number of pages as the basic-set for calculating and cannot be used in plain texts. This paper introduces a new algorithm: PK-TDC which makes use of the iterative idea of Hits. PK-TDC searches the authority pages and keywords on the topology of pages-keywords, and clusters the pages by their including keywords. The experiment shows PK-TDC algorithm significantly performs in extracting the subjects and clustering not only in the pages with hyperlinks but also in the plain texts
ISBN:1424400619
9781424400614
ISSN:2160-133X
DOI:10.1109/ICMLC.2006.258833