Information Extraction from German Clinical Care Documents in Context of Alzheimer’s Disease

Dementia affects approximately 50 million people in the world today, the majority suffering from Alzheimer’s disease (AD). The availability of long-term patient data is one of the most important prerequisites for a better understanding of diseases. Worldwide, many prospective, longitudinal cohort st...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 11; no. 22; p. 10717
Main Authors	Langnickel, Lisa, Krockauer, Kilian, Uebachs, Mischa, Schaaf, Sebastian, Madan, Sumit, Klockgether, Thomas, Fluck, Juliane
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.11.2021
Subjects	Alzheimer's disease clinical text mining data standardization Datasets Dementia Dementia disorders Dictionaries Hospitals Information retrieval Language Memory Neurodegeneration Patients semantic interoperability
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Dementia affects approximately 50 million people in the world today, the majority suffering from Alzheimer’s disease (AD). The availability of long-term patient data is one of the most important prerequisites for a better understanding of diseases. Worldwide, many prospective, longitudinal cohort studies have been initiated to understand AD. However, this approach takes years to enroll and follow up with a substantial number of patients, resulting in a current lack of data. This raises the question of whether clinical routine datasets could be utilized to extend collected registry data. It is, therefore, necessary to assess what kind of information is available in memory clinic routine databases. We did exactly this based on the example of the University Hospital Bonn. Whereas a number of data items are available in machine readable formats, additional valuable information is stored in textual documents. The extraction of information from such documents is only applicable via text mining methods. Therefore, we set up modular, rule-based text mining workflows requiring minimal sets of training data. The system achieves F1-scores over 95% for the most relevant classes, i.e., memory disturbances from medical reports and quantitative scores from semi-structured neuropsychological test protocols. Thus, we created a machine-readable core dataset for over 8000 patient visits over a ten-year period.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app112210717