Building of a web corpus with the help of a reference web crawl

Computer-implemented method for building a web corpus (WCD) comprising the steps of: - sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, - receiving by the web crawler (WC) a response from the reference web crawl...

Full description

Saved in:

Bibliographic Details
Main Authors	GREHANT XAVIER, FERENCZI JIM, RICHARD SEBASTIEN
Format	Patent
Language	Chinese English
Published	30.10.2013
Subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computer-implemented method for building a web corpus (WCD) comprising the steps of: - sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, - receiving by the web crawler (WC) a response from the reference web crawl agent (RWCA); - if this response does not contain the resource identified by the identifier, downloading by the web crawler (WC) the resource from the website (WS) corresponding to the identifier and adding the resource to the web corpus (WCD; and - if this response contains the resource identified by the identifier, adding the resource to the web corpus (WCD).
Bibliography:	Application Number: CN201310209210