Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression
•NAFLD is documented poorly in the EMR. We assess how well we can identify it using an NLP approach versus ICD or text search.•NAFLD can progress to NASH and cirrhosis. We examine our ability to measure disease progression within the EMR with NLP.•We look at breakdowns in the knowledge chain between...
Saved in:
Published in | International journal of medical informatics (Shannon, Ireland) Vol. 129; pp. 334 - 341 |
---|---|
Main Authors | , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Ireland
Elsevier B.V
01.09.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •NAFLD is documented poorly in the EMR. We assess how well we can identify it using an NLP approach versus ICD or text search.•NAFLD can progress to NASH and cirrhosis. We examine our ability to measure disease progression within the EMR with NLP.•We look at breakdowns in the knowledge chain between doctors, when NAFLD was identified but not mentioned in future notes.•We identify cases of these breakdowns where the patient developed NASH/cirrhosis without referencing prior NAFLD diagnosis.
Electronic health record (EHR) systems contain structured data (such as diagnostic codes) and unstructured data (clinical documentation). Clinical insights can be derived from analyzing both. The use of natural language processing (NLP) algorithms to effectively analyze unstructured data has been well demonstrated. Here we examine the utility of NLP for the identification of patients with non-alcoholic fatty liver disease, assess patterns of disease progression, and identify gaps in care related to breakdown in communication among providers.
All clinical notes available on the 38,575 patients enrolled in the Mount Sinai BioMe cohort were loaded into the NLP system. We compared analysis of structured and unstructured EHR data using NLP, free-text search, and diagnostic codes with validation against expert adjudication. We then used the NLP findings to measure physician impression of progression from early-stage NAFLD to NASH or cirrhosis. Similarly, we used the same NLP findings to identify mentions of NAFLD in radiology reports that did not persist into clinical notes.
Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP outperformed ICD and text search in both sensitivity (NLP: 0.93, ICD: 0.28, text search: 0.81) and F2 score (NLP: 0.92, ICD: 0.34, text search: 0.81). Of 2281 NAFLD patients, 673 (29.5%) were believed to have progressed to NASH or cirrhosis. Among 176 where NAFLD was noted prior to NASH, the average progression time was 410 days. 619 (27.1%) NAFLD patients had it documented only in radiology notes and not acknowledged in other forms of clinical documentation. Of these, 170 (28.4%) were later identified as having likely developed NASH or cirrhosis after a median 1057.3 days.
NLP-based approaches were more accurate at identifying NAFLD within the EHR than ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Analysis of information flows demonstrated loss of key information that could have been used to help prevent the progression of early NAFLD (NAFL) to NASH or cirrhosis.
For identification of NAFLD, NLP performed better than alternative selection modalities. It then facilitated analysis of knowledge flow between physician and enabled the identification of breakdowns where key information was lost that could have slowed or prevented later disease progression. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Conception and design: TTVV, LC, GNN Writing the article: TTVV, LC, GNN Equal Contribution Data collection: TTVV, LC, GNN Obtained funding: JC, SGC, GNN Critical revision of the article: PB, CKC, JLK, SBE, RD, RL Final approval of the article: TTVV, GNN Statistical analysis: TTVV, GNN Overall responsibility: TTVV, LC, GNN AUTHORS’ CONTRIBUTIONS Analysis and interpretation: TTVV, GNN |
ISSN: | 1386-5056 1872-8243 |
DOI: | 10.1016/j.ijmedinf.2019.06.028 |