Topic Distillation and Spectral Filtering

This paper discuss topic distillation, an information retrieval problemthat is emerging as a critical task for the www. Algorithms for this problemmust distill a small number of high-quality documents addressing a broadtopic from a large set of candidates.We give a review of the literature, and comp...

Full description

Saved in:
Bibliographic Details
Published inThe Artificial intelligence review Vol. 13; no. 5-6; pp. 409 - 435
Main Authors Chakrabarti, Soumen, Dom, Byron E, Gibson, David, Kumar, Ravi, Raghavan, Prabhakar, Rajagopalan, Sridhar, Tomkins, Andrew
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Nature B.V 01.12.1999
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper discuss topic distillation, an information retrieval problemthat is emerging as a critical task for the www. Algorithms for this problemmust distill a small number of high-quality documents addressing a broadtopic from a large set of candidates.We give a review of the literature, and compare the problem with relatedtasks such as classification, clustering, and indexing. We then describe ageneral approach to topic distillation with applications to searching andpartitioning, based on the algebraic properties of matrices derived fromparticular documents within the corpus. Our method - which we call special filtering - combines the use of terms, hyperlinks and anchor-textto improve retrieval performance. We give results for broad-topic querieson the www, and also give some anecdotal results applying the sametechniques to US Supreme Court law cases, US patents, and a set of WallStreet Journal newspaper articles.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:0269-2821
1573-7462
DOI:10.1023/a:1006596506229