Method for identifying related pages in a hyperlinked database

A method is described for identifying related pages among a plurality of pages in a linked database such as the World Wide Web. An initial page is selected from the plurality of pages. Pages linked to the initial page are represented as a graph in a memory. The pages represented in the graph are sco...

Full description

Saved in:
Bibliographic Details
Main Authors Black, Jeffrey Dean, Henzinger, Monika R, Broder, Andrei Z
Format Patent
LanguageEnglish
Published 08.12.2009
Online AccessGet full text

Cover

Loading…
More Information
Summary:A method is described for identifying related pages among a plurality of pages in a linked database such as the World Wide Web. An initial page is selected from the plurality of pages. Pages linked to the initial page are represented as a graph in a memory. The pages represented in the graph are scored on content, and a set of pages is selected, the selected set of pages having scores greater than a first predetermined threshold. The selected set of pages is scored on connectivity, and a subset of the set of pages that have scores greater than a second predetermined threshold are selected as related pages.