Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression

•NAFLD is documented poorly in the EMR. We assess how well we can identify it using an NLP approach versus ICD or text search.•NAFLD can progress to NASH and cirrhosis. We examine our ability to measure disease progression within the EMR with NLP.•We look at breakdowns in the knowledge chain between...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of medical informatics (Shannon, Ireland) Vol. 129; pp. 334 - 341
Main Authors Van Vleck, Tielman T., Chan, Lili, Coca, Steven G., Craven, Catherine K., Do, Ron, Ellis, Stephen B., Kannry, Joseph L., Loos, Ruth J.F., Bonis, Peter A., Cho, Judy, Nadkarni, Girish N.
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.09.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•NAFLD is documented poorly in the EMR. We assess how well we can identify it using an NLP approach versus ICD or text search.•NAFLD can progress to NASH and cirrhosis. We examine our ability to measure disease progression within the EMR with NLP.•We look at breakdowns in the knowledge chain between doctors, when NAFLD was identified but not mentioned in future notes.•We identify cases of these breakdowns where the patient developed NASH/cirrhosis without referencing prior NAFLD diagnosis. Electronic health record (EHR) systems contain structured data (such as diagnostic codes) and unstructured data (clinical documentation). Clinical insights can be derived from analyzing both. The use of natural language processing (NLP) algorithms to effectively analyze unstructured data has been well demonstrated. Here we examine the utility of NLP for the identification of patients with non-alcoholic fatty liver disease, assess patterns of disease progression, and identify gaps in care related to breakdown in communication among providers. All clinical notes available on the 38,575 patients enrolled in the Mount Sinai BioMe cohort were loaded into the NLP system. We compared analysis of structured and unstructured EHR data using NLP, free-text search, and diagnostic codes with validation against expert adjudication. We then used the NLP findings to measure physician impression of progression from early-stage NAFLD to NASH or cirrhosis. Similarly, we used the same NLP findings to identify mentions of NAFLD in radiology reports that did not persist into clinical notes. Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP outperformed ICD and text search in both sensitivity (NLP: 0.93, ICD: 0.28, text search: 0.81) and F2 score (NLP: 0.92, ICD: 0.34, text search: 0.81). Of 2281 NAFLD patients, 673 (29.5%) were believed to have progressed to NASH or cirrhosis. Among 176 where NAFLD was noted prior to NASH, the average progression time was 410 days. 619 (27.1%) NAFLD patients had it documented only in radiology notes and not acknowledged in other forms of clinical documentation. Of these, 170 (28.4%) were later identified as having likely developed NASH or cirrhosis after a median 1057.3 days. NLP-based approaches were more accurate at identifying NAFLD within the EHR than ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Analysis of information flows demonstrated loss of key information that could have been used to help prevent the progression of early NAFLD (NAFL) to NASH or cirrhosis. For identification of NAFLD, NLP performed better than alternative selection modalities. It then facilitated analysis of knowledge flow between physician and enabled the identification of breakdowns where key information was lost that could have slowed or prevented later disease progression.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Conception and design: TTVV, LC, GNN
Writing the article: TTVV, LC, GNN
Equal Contribution
Data collection: TTVV, LC, GNN
Obtained funding: JC, SGC, GNN
Critical revision of the article: PB, CKC, JLK, SBE, RD, RL
Final approval of the article: TTVV, GNN
Statistical analysis: TTVV, GNN
Overall responsibility: TTVV, LC, GNN
AUTHORS’ CONTRIBUTIONS
Analysis and interpretation: TTVV, GNN
ISSN:1386-5056
1872-8243
DOI:10.1016/j.ijmedinf.2019.06.028