Custodian disambiguation and data matching

Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format...

Full description

Saved in:
Bibliographic Details
Main Authors Petrenko, Pavlo, Hampp-Bahnmueller, Thomas A. P, Lorch, Markus, Bremer, Lars, Schmid, Sebastian B
Format Patent
LanguageEnglish
Published 28.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.
Bibliography:Application Number: US201514692543