Deep web crawler form filling method and device based on URL (uniform resource locator) subject classification

The invention provides a deep web crawler form filling method and device based on URL (uniform resource locator) subject classification. The device comprises a preprocessing unit, a downloading unit, a web page analysis unit, a web page processing unit and a storage unit. Compared with the prior art...

Full description

Saved in:
Bibliographic Details
Main Authors Hou Dayong, Li Qinghai, Zou Libin, Jian Songquan
Format Patent
LanguageChinese
English
Published 10.08.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention provides a deep web crawler form filling method and device based on URL (uniform resource locator) subject classification. The device comprises a preprocessing unit, a downloading unit, a web page analysis unit, a web page processing unit and a storage unit. Compared with the prior art, the deep web crawler form filling method and device have the benefits as follows: deep web crawler form filling optimization based on URL subject classification is realized by means of the deep web crawler form filling method and device based on URL subject classification. Deep web crawlers are further intelligentized by means of an ontology base and a semantic-based similarity matching algorithm, so that data in related fields are enriched, mapping storage is established, and a new idea is provided for information retrieval of a search engine. 本发明提供了种基于URL主题分类的深层网络爬虫表单填充方法和装置,该装置包括预处理单元、下载单元、网页分析单元、网页处理单元和存储单元。与现有技术比较本发明的有益效果在于:本发明提供的种基于URL主题分类的深层网络爬虫表单填充方法和装置,实现了基于URL主题分类的深层网络爬虫表单填充优化。利用本体库以及基于语义的相似度匹配算法把深层网络爬虫
Bibliography:Application Number: CN20161247854