FREE TEXT DE-IDENTIFICATION
A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate wo...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
30.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate word items that have a number of occurrences (k) in the unstructured text below a threshold (120). Subsequently, the low-rate word items and the blacklist word items are masked (130) in the unstructured text to generate the de-identified output (140). |
---|---|
Bibliography: | Application Number: US201917260265 |