Theme crawler method and device, electronic equipment and computer readable storage medium

The embodiment of the invention provides a topic crawler method and device, electronic equipment and a computer readable storage medium, and the method comprises the steps: carrying out the structure analysis of a to-be-crawled hyperlink in a to-be-crawled link queue, obtaining a link feature, predi...

Full description

Saved in:
Bibliographic Details
Main Author ZHANG XINLIANG
Format Patent
LanguageChinese
English
Published 01.09.2023
Subjects
Online AccessGet full text

Cover

More Information
Summary:The embodiment of the invention provides a topic crawler method and device, electronic equipment and a computer readable storage medium, and the method comprises the steps: carrying out the structure analysis of a to-be-crawled hyperlink in a to-be-crawled link queue, obtaining a link feature, predicting the relevancy between the to-be-crawled link and a preset topic based on the link feature and the preset topic, and obtaining a to-be-crawled hyperlink; the to-be-crawled link with the relevancy larger than the preset relevancy threshold value is determined as the target hyperlink, the subject crawler is achieved, the hyperlinks are subjected to structural analysis, the situation that page information content pointed by the to-be-crawled hyperlink is analyzed is avoided, the relevancy between the to-be-crawled link and the preset subject is predicted, and therefore the target crawler can be achieved. Accurate recognition of the theme of the hyperlink is achieved, and the theme crawler efficiency is greatly im
Bibliography:Application Number: CN202210160887