Data Extraction Approach For Retail Crawling Engine

A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from object...

Full description

Saved in:

Bibliographic Details
Main Authors	Zaytsev, Andrey, Gilfanov, Ruslan, Aggarwal, Amit
Format	Patent
Language	English
Published	30.03.2023
Subjects	CALCULATING COMPUTING COUNTING DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES ELECTRIC DIGITAL DATA PROCESSING PHYSICS SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from objects without rendering, and suppressing retrieval of remote resources. Data is extracted according to engine control statements including a selector and extractor. A website may be crawled repeatedly and changes in extracted data may be detected and flagged. Engine control statements may be automatically changed in response to detecting a change in the configuration of the website. Images of product records may be correlated with one another by first comparing text of the product records and selecting images for comparison based on composition. Images are compared using a machine learning model. Images determined to be similar may be presented to a human for a correlation decision.
Bibliography:	Application Number: US202117486562