Web Information Organization Using Keyword Distillation Based Clustering

This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word am...

Full description

Saved in:

Bibliographic Details
Published in	2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology Vol. 1; pp. 325 - 330
Main Authors	Shibata, Tomohide, Bamba, Yasuo, Shinzato, Keiji, Kurohashi, Sadao
Format	Conference Proceeding
Language	English
Published	Washington, DC, USA IEEE Computer Society 2009 IEEE
Series	ACM Conferences
Subjects	clustering Clustering methods Computing methodologies > Artificial intelligence > Knowledge representation and reasoning Computing methodologies > Machine learning > Learning paradigms > Unsupervised learning > Cluster analysis Conferences Data mining Educational products Grid computing Information systems > Information retrieval Information systems > Information retrieval > Evaluation of retrieval results Intelligent agent keyword unification Navigation open search engine Search engines Web pages clustering open search engine keyword unification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word ambiguity, using linguistic resources and contexts of a user's query. The system provides a clustering result from 1,000 pages in less than one minute by taking advantage of a search engine infrastructure and grid computing environment. Experimental results show that the system correctly merged synonymous keywords and is useful for finding topics hidden in the lower-ranked pages in a search result.
ISBN:	0769538010 9780769538013
DOI:	10.1109/WI-IAT.2009.57