Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system
Objectives To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs). Materials and Methods We included patients...
Saved in:
Published in | JAMIA open Vol. 6; no. 4; p. ooad085 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Oxford University Press
01.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Objectives
To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).
Materials and Methods
We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score.
Results
The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.
Discussion
The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.
Conclusion
The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.
Lay Summary
We developed and tested an algorithm for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the free-text notes in electronic health records (EHRs). Thus, we included patients aged 18 years or older who received care at the Johns Hopkins Health System between July 2016 and June 2021 and had at least 1 note in their EHR during the study period. We developed keywords and phrases, which described the social needs, and developed natural language processing (NLP) algorithms that used those keywords to identify different social needs in free-text EHR. We assessed the performance of these algorithms and compared what they identified in the notes with what a human identified through a direct review of the notes. The algorithm for identifying residential instability had the best overall performance, the algorithm for identifying food insecurity performed relatively well but the transportation issues algorithm was the lowest overall performing metric. The NLP algorithms developed in this study would provide the opportunity for implementation in different healthcare systems and could be adapted and potentially operationalized in the routine data processes of the healthcare systems. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Author Contributions: Geoffrey M. Gray and Ayah Zirikly are dual first authors. |
ISSN: | 2574-2531 2574-2531 |
DOI: | 10.1093/jamiaopen/ooad085 |