Identifying Symptom Information in Clinical Notes Using Natural Language Processing
Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods design...
Saved in:
Published in | Nursing research (New York) Vol. 70; no. 3; p. 173 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.05.2021
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Abstract | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.
We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.
First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.
Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.
Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research. |
---|---|
AbstractList | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.
We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.
First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.
Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.
Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research. |
Author | Bakken, Suzanne George, Maureen Koleck, Theresa A Mitha, Shazia Miaskowski, Christine Henderson, Morgan M Topaz, Maxim Tatonetti, Nicholas P Smaldone, Arlene |
Author_xml | – sequence: 1 givenname: Theresa A surname: Koleck fullname: Koleck, Theresa A – sequence: 2 givenname: Nicholas P surname: Tatonetti fullname: Tatonetti, Nicholas P – sequence: 3 givenname: Suzanne surname: Bakken fullname: Bakken, Suzanne – sequence: 4 givenname: Shazia surname: Mitha fullname: Mitha, Shazia – sequence: 5 givenname: Morgan M surname: Henderson fullname: Henderson, Morgan M – sequence: 6 givenname: Maureen surname: George fullname: George, Maureen – sequence: 7 givenname: Christine surname: Miaskowski fullname: Miaskowski, Christine – sequence: 8 givenname: Arlene surname: Smaldone fullname: Smaldone, Arlene – sequence: 9 givenname: Maxim surname: Topaz fullname: Topaz, Maxim |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33196504$$D View this record in MEDLINE/PubMed |
BookMark | eNpNj81KxDAAhIMo7o--gUheoGv-2iRHKboWShXXPS_ZJimRNilNe-jba1HBuQx8MwzMBlz64A0AdxjtMJL8oared-i_mBAXYI1TKhIpGF-BTYyfC88IvQYrSrHMUsTW4FBo40dnZ-cbeJi7fgwdLLwNQ6dGFzx0Huat865WLazCaCI8xqVbqXEavlmpfDOpxsC3IdQmLtkNuLKqjeb217fg-Pz0kb8k5eu-yB_LpGYC00SSVGpqiTZnwjOrjDYSa04sJUpLUWspteZZKhRiGCEpayyR1ZxpZRWvLdmC-5_dfjp3Rp_6wXVqmE9_78gXRAtTmg |
CitedBy_id | crossref_primary_10_3389_fpain_2024_1254792 crossref_primary_10_1016_j_ijdrr_2024_104951 crossref_primary_10_1016_j_outlook_2022_04_004 crossref_primary_10_1093_eurjcn_zvad068 crossref_primary_10_1177_10998004221121109 crossref_primary_10_1038_s41598_024_51615_5 crossref_primary_10_1097_NCC_0000000000001287 crossref_primary_10_1177_10775595231194599 crossref_primary_10_1002_cam4_7253 crossref_primary_10_1093_jamiaopen_ooae082 crossref_primary_10_1016_j_ijnurstu_2021_104153 crossref_primary_10_1016_j_identj_2024_06_015 crossref_primary_10_2196_32903 crossref_primary_10_1097_CIN_0000000000000967 crossref_primary_10_1111_jnu_13038 crossref_primary_10_7759_cureus_65792 crossref_primary_10_1200_CCI_23_00235 crossref_primary_10_1016_j_soncn_2023_151428 crossref_primary_10_1016_j_ienj_2023_101272 crossref_primary_10_1093_jamia_ocad079 crossref_primary_10_1002_nur_22190 crossref_primary_10_1177_10547738241292657 crossref_primary_10_1016_j_jamda_2023_09_006 crossref_primary_10_1097_ANS_0000000000000423 crossref_primary_10_1097_NNR_0000000000000586 crossref_primary_10_1016_j_ijmedinf_2024_105544 crossref_primary_10_1136_bmjoq_2023_002295 crossref_primary_10_1038_s41598_024_56324_7 crossref_primary_10_1038_s41746_024_01121_9 |
ContentType | Journal Article |
Copyright | Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
Copyright_xml | – notice: Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1097/NNR.0000000000000488 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Nursing Social Sciences (General) |
EISSN | 1538-9847 |
ExternalDocumentID | 33196504 |
Genre | Journal Article Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R35 GM131905 – fundername: NINR NIH HHS grantid: K99 NR017651 – fundername: NINR NIH HHS grantid: P30 NR016587 – fundername: NINR NIH HHS grantid: R00 NR017651 |
GroupedDBID | --- -ET .3C .GJ .Z2 0-6 07C 123 186 2FS 2KS 3T~ 41~ 4Q1 4Q2 4Q3 53G 5RE 5VS 85S 8L- 9V3 AAAAV AAHPQ AAIKC AAIQE AAJYS AAMNW AAMTA AAQQT AARTV AASCR AAUEB AAWTL AAYEP AAYJJ ABASU ABBUW ABDIG ABILE ABIVO ABJNI ABNJN ABOCM ABPPZ ABPXF ABVCZ ABWJO ABXVJ ABZAD ABZZY ACAAF ACDDN ACDOF ACEWG ACEWU ACGFO ACGFS ACHQT ACIFK ACILI ACJBD ACNCT ACNWC ACTAD ACTHT ACWDW ACWRI ACXJB ACXNZ ADBIZ ADEGP ADFPA ADGGA ADGHP ADHPY ADMHC ADNKB ADRCX ADTGS ADUKH AE3 AEETU AENEX AEQHQ AFBFQ AFDTB AFFNX AFMBP AFMFG AFPHX AFSOK AFUWQ AFYGQ AGBRE AGNAY AHQNM AHRYX AHVBC AHWXW AIDAL AIDBO AINUH AJCLO AJEOO AJIOK AJNWD AJNYG AJZMW AKCTQ ALBXT ALKUP ALMA_UNASSIGNED_HOLDINGS ALMTX AMJPA AMKUR AMNEI AOHHW AOQMC ATPOU BQLVK BQ~ BS7 BYPQX C45 CGR CS3 CUY CVF DIWNM DU5 DUNZO E.X EBS ECM EEVPB EIF EJD EX3 F2K F2L F5P FCALG FL- GH5 GNXGY GQDEL H0~ HLJTE HYJ HZ~ H~9 IKREB IN~ IYOWL J5H JF9 JG8 JK3 JK8 K8S KD2 KMI KOO L-C L47 L7B LK2 MMDCI MPPUT MZP N4W NEJ NHB NPM N~6 N~M O9- OAG OAH OBZCC OCUKA ODA OEN OFFRU OGKNY OHCKH OHT OKBHI OL1 OLG OLL OLV OLZ OMK ON2 ONSOO ONV OPUJH OPX ORAPC OROCO ORVUJ OUGNH OUVQU OUVZD OVD OVDLW OVDNE OVOZU OWU OWV OWW OWX OWY OWZ OXXIT P-K P2P PQQKQ QMB QS- QZG R58 R77 RLZ S4R S4S T8P TEORI TSPGW UKR UMD V2I VVN W3M WAC WG1 WH7 WOQ WOW X3V X3W X7L XXN XYM XZL YFH YHZ YOC YOJ YQI YQJ YR5 YSQ YXB YYQ YZZ ZCG ZFV ZGI ZT4 ZUP ZXP ZZMQN ~G0 |
ID | FETCH-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2 |
IngestDate | Mon Jul 21 05:34:05 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
License | Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/9109773 |
PMID | 33196504 |
ParticipantIDs | pubmed_primary_33196504 |
PublicationCentury | 2000 |
PublicationDate | 2021-05-01 |
PublicationDateYYYYMMDD | 2021-05-01 |
PublicationDate_xml | – month: 05 year: 2021 text: 2021-05-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Nursing research (New York) |
PublicationTitleAlternate | Nurs Res |
PublicationYear | 2021 |
SSID | ssj0004623 |
Score | 2.4616482 |
Snippet | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 173 |
SubjectTerms | Constipation - diagnosis Depression - diagnosis Electronic Health Records - statistics & numerical data Fatigue - diagnosis Humans Natural Language Processing Pattern Recognition, Automated - methods Sleep Wake Disorders - diagnosis Symptom Assessment - nursing Tachycardia - diagnosis Vocabulary, Controlled |
Title | Identifying Symptom Information in Clinical Notes Using Natural Language Processing |
URI | https://www.ncbi.nlm.nih.gov/pubmed/33196504 |
Volume | 70 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT9wwELUWEBUXVCjlq1Q-9NAKhZLEiddHQCCEaA7dReKGJrGjXdAuK204sP-B_8w4Y5NAKaLdQ7SKpSjKe5l47HlvGPsWp8aERiZBIgwmKHEeBWrfdIO0AJGaXBamqKt8s_T0QpxdJpedzkOraumuyveK2au6kv9BFc8hrlYl-w_IPl0UT-B_xBePiDAe34UxqWxJqdS7H02q29Gu0xf5GsYjr3zMbu0KK1UIZEBuG-dusdLLBfxn7Pq5JtcZAg2ed-5pqrRth10quEYCmCk066N9sF7fFZUMIOlsIj1tJGWHcHNDYa93N4Nxs8P_a1jRRlRvALMhtJcmorApBNwzrXDaJU9NH2-pUYjjVdwKniE1NfkjqJNZcJb9JrNJ_xPUD7CF82RUAx3bsJJQV-O3R19YbfuhOTaHSYftomqXfp5EtlHstZdK_nztdpbYB3-JF1lKPVvpf2TLLs3gB8SZFdYx41W26ABdZZukyuYusk_5d2c__uMT67VYxR2reItVfDjmnlW8ZhWvWcUdq7hnFW9YtcYuTo77R6eBa70RFKIbxoHCrFjHZaRNHsm0BKONCrWMyjgCrbqFVkpribNX2BehzTIKnFiWWgoNJciijD6z-TESbINxsBvFOeikhESoPFGJrm3nQAqhjUg32To9qasJ-atc-We49deRbbbU8O0LWyjxhTY7ODus8q81ao9f9GPG |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+Symptom+Information+in+Clinical+Notes+Using+Natural+Language+Processing&rft.jtitle=Nursing+research+%28New+York%29&rft.au=Koleck%2C+Theresa+A&rft.au=Tatonetti%2C+Nicholas+P&rft.au=Bakken%2C+Suzanne&rft.au=Mitha%2C+Shazia&rft.date=2021-05-01&rft.eissn=1538-9847&rft.volume=70&rft.issue=3&rft.spage=173&rft_id=info:doi/10.1097%2FNNR.0000000000000488&rft_id=info%3Apmid%2F33196504&rft_id=info%3Apmid%2F33196504&rft.externalDocID=33196504 |