Theme crawler method and device, electronic equipment and computer readable storage medium
The embodiment of the invention provides a topic crawler method and device, electronic equipment and a computer readable storage medium, and the method comprises the steps: carrying out the structure analysis of a to-be-crawled hyperlink in a to-be-crawled link queue, obtaining a link feature, predi...
Saved in:
Main Author | |
---|---|
Format | Patent |
Language | Chinese English |
Published |
01.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Summary: | The embodiment of the invention provides a topic crawler method and device, electronic equipment and a computer readable storage medium, and the method comprises the steps: carrying out the structure analysis of a to-be-crawled hyperlink in a to-be-crawled link queue, obtaining a link feature, predicting the relevancy between the to-be-crawled link and a preset topic based on the link feature and the preset topic, and obtaining a to-be-crawled hyperlink; the to-be-crawled link with the relevancy larger than the preset relevancy threshold value is determined as the target hyperlink, the subject crawler is achieved, the hyperlinks are subjected to structural analysis, the situation that page information content pointed by the to-be-crawled hyperlink is analyzed is avoided, the relevancy between the to-be-crawled link and the preset subject is predicted, and therefore the target crawler can be achieved. Accurate recognition of the theme of the hyperlink is achieved, and the theme crawler efficiency is greatly im |
---|---|
Bibliography: | Application Number: CN202210160887 |