Distributed crawling of hyperlinked documents

Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can...

Full description

Saved in:
Bibliographic Details
Main Authors GOMES BENEDICT, GHEMAWAT SANJAY, DEAN JEFFREY A, SILVERSTEIN CRAIG
Format Patent
LanguageEnglish
Published 04.12.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can be a predetermined amount of time, vary by host and be adjusted according to actual retrieval times from the host.
Bibliography:Application Number: US20000638082