Data Correlation System And Method

A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from object...

Full description

Saved in:
Bibliographic Details
Main Authors Zaytsev, Andrey, Gilfanov, Ruslan, Aggarwal, Amit
Format Patent
LanguageEnglish
Published 30.03.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from objects without rendering, and suppressing retrieval of remote resources. Data is extracted according to engine control statements including a selector and extractor. A website may be crawled repeatedly and changes in extracted data may be detected and flagged. Engine control statements may be automatically changed in response to detecting a change in the configuration of the website. Images of product records may be correlated with one another by first comparing text of the product records and selecting images for comparison based on composition. Images are compared using a machine learning model. Images determined to be similar may be presented to a human for a correlation decision.
AbstractList A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same product. A website is crawled efficiently by rendering webpages using a virtual browser that ignores blacklisted elements, extracts data from objects without rendering, and suppressing retrieval of remote resources. Data is extracted according to engine control statements including a selector and extractor. A website may be crawled repeatedly and changes in extracted data may be detected and flagged. Engine control statements may be automatically changed in response to detecting a change in the configuration of the website. Images of product records may be correlated with one another by first comparing text of the product records and selecting images for comparison based on composition. Images are compared using a machine learning model. Images determined to be similar may be presented to a human for a correlation decision.
Author Gilfanov, Ruslan
Zaytsev, Andrey
Aggarwal, Amit
Author_xml – fullname: Zaytsev, Andrey
– fullname: Gilfanov, Ruslan
– fullname: Aggarwal, Amit
BookMark eNrjYmDJy89L5WRQckksSVRwzi8qSs1JLMnMz1MIriwuSc1VcMxLUfBNLcnIT-FhYE1LzClO5YXS3AzKbq4hzh66qQX58anFBYnJqXmpJfGhwUYGRsYGlmYGphaOhsbEqQIA950oNg
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
ExternalDocumentID US2023096058A1
GroupedDBID EVB
ID FETCH-epo_espacenet_US2023096058A13
IEDL.DBID EVB
IngestDate Fri Jul 19 13:08:33 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_US2023096058A13
Notes Application Number: US202117486567
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230330&DB=EPODOC&CC=US&NR=2023096058A1
ParticipantIDs epo_espacenet_US2023096058A1
PublicationCentury 2000
PublicationDate 20230330
PublicationDateYYYYMMDD 2023-03-30
PublicationDate_xml – month: 03
  year: 2023
  text: 20230330
  day: 30
PublicationDecade 2020
PublicationYear 2023
RelatedCompanies The Yes Platform, Inc
RelatedCompanies_xml – name: The Yes Platform, Inc
Score 3.4609823
Snippet A computer system extracts product data from a website and correlates product records from multiple sources to one another as corresponding to the same...
SourceID epo
SourceType Open Access Repository
SubjectTerms CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
PHYSICS
Title Data Correlation System And Method
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230330&DB=EPODOC&locale=&CC=US&NR=2023096058A1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR3LSsNAcCj1edOo-KgSVHILJiZrkkOQNg-KkLbYRnor-yoIEouJ-PtONqn21OPOwMzuwuzM7LwA7gNfLCWR3AyIR03XthyTLR1uehydZ09Km6s-BdnoaZi7L3My78DHuhZG9Qn9Uc0RUaI4ynul3uvV_ydWrHIrywf2jqDP53QWxkbrHaM9jSyMeBAmk3E8jowoCvOpMXptcEEdA-yjr7SDhrRXy0PyNqjrUlabSiU9gt0J0iuqY-jIQoODaD17TYP9rA15a7CncjR5icBWDssTuI1pRfWonqzR5LLpTedxvV8IPVNDoU_hLk1m0dBEtou_Uy7y6eYenTPoov8vz0H3qesxyoTNmXAJIYxRIgiRtk9t91FYF9DbRulyO_oKDuulKrOzetCtvr7lNerZit2o6_kFsa1_2Q
link.rule.ids 230,309,783,888,25576,76876
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5KfdSbVsVH1aCSW7CxWZMcgrR5ELVJi02kt7K72YIgsZiIf9_JptWeep2Bnd2Fb2dm5wVwa1vZXBDBNZuYVDP0bk9j8x7XTI7OsymEzmWfgih-CFPjeUqmDfhY1cLIPqE_sjkiIooj3kv5Xi_-P7E8mVtZ3LF3JH0-BonjqUvvGO1pFKF6A8cfj7yRq7quk07U-LXm2VUMsI--0hYa2WaFB_9tUNWlLNaVSrAP22NcLy8PoCHyNrTc1ey1NuxGy5B3G3ZkjiYvkLjEYXEI1x4tqeJWkzXqXDal7jyu9PNMieRQ6CO4CfzEDTUUO_s75SydrO-xdwxN9P_FCSgWNUxGWaZzlhmEEMYoyQgRukV14z7rnkJn00pnm9lX0AqTaDgbPsUv57BXsWTJXbcDzfLrW1ygzi3ZpbyqX9v8gsw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=Data+Correlation+System+And+Method&rft.inventor=Zaytsev%2C+Andrey&rft.inventor=Gilfanov%2C+Ruslan&rft.inventor=Aggarwal%2C+Amit&rft.date=2023-03-30&rft.externalDBID=A1&rft.externalDocID=US2023096058A1