Web Information Organization Using Keyword Distillation Based Clustering

This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word am...

Full description

Saved in:
Bibliographic Details
Published in2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology Vol. 1; pp. 325 - 330
Main Authors Shibata, Tomohide, Bamba, Yasuo, Shinzato, Keiji, Kurohashi, Sadao
Format Conference Proceeding
LanguageEnglish
Published Washington, DC, USA IEEE Computer Society 2009
IEEE
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word ambiguity, using linguistic resources and contexts of a user's query. The system provides a clustering result from 1,000 pages in less than one minute by taking advantage of a search engine infrastructure and grid computing environment. Experimental results show that the system correctly merged synonymous keywords and is useful for finding topics hidden in the lower-ranked pages in a search result.
ISBN:0769538010
9780769538013
DOI:10.1109/WI-IAT.2009.57