FREE TEXT DE-IDENTIFICATION

A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate wo...

Full description

Saved in:
Bibliographic Details
Main Authors PLETEA, DANIEL, VAN LIESDONK, PETER PETRUS, KOSTER, ROBERT PAUL
Format Patent
LanguageEnglish
Published 30.09.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate word items that have a number of occurrences (k) in the unstructured text below a threshold (120). Subsequently, the low-rate word items and the blacklist word items are masked (130) in the unstructured text to generate the de-identified output (140).
Bibliography:Application Number: US201917260265