Distributed crawling of hyperlinked documents
Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English |
Published |
04.12.2007
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can be a predetermined amount of time, vary by host and be adjusted according to actual retrieval times from the host. |
---|---|
Bibliography: | Application Number: US20000638082 |