FREE TEXT DE-IDENTIFICATION

A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate wo...

Full description

Saved in:

Bibliographic Details
Main Authors	PLETEA, DANIEL, VAN LIESDONK, PETER PETRUS, KOSTER, ROBERT PAUL
Format	Patent
Language	English
Published	30.09.2021
Subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING HANDLING RECORD CARRIERS HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATIONTECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING ORPROCESSING OF MEDICAL OR HEALTHCARE DATA INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTEDFOR SPECIFIC APPLICATION FIELDS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A system or method generates de-identified output from a data set of patient data comprising unstructured text (100) in natural language phrases. A blacklist (105) has word items that are not allowed. The unstructured text is processed to determine a word count (110) comprising a list of low-rate word items that have a number of occurrences (k) in the unstructured text below a threshold (120). Subsequently, the low-rate word items and the blacklist word items are masked (130) in the unstructured text to generate the de-identified output (140).
Bibliography:	Application Number: US201917260265