Data correlation system and method
A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from object...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
20.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from objects without rendering, and suppressing retrieval of remote resources. Data is extracted according to engine control statements including a selector and extractor. A website may be crawled repeatedly and changes in extracted data may be detected and flagged. Engine control statements may be automatically changed in response to detecting a change in the configuration of the website. Images of product records may be correlated with one another by first comparing text of the product records and selecting images for comparison based on composition. Images are compared using a machine learning model. Images determined to be similar may be presented to a human for a correlation decision. |
---|---|
Bibliography: | Application Number: US202117486567 |