Identifying Symptom Information in Clinical Notes Using Natural Language Processing

Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods design...

Full description

Saved in:
Bibliographic Details
Published inNursing research (New York) Vol. 70; no. 3; p. 173
Main Authors Koleck, Theresa A, Tatonetti, Nicholas P, Bakken, Suzanne, Mitha, Shazia, Henderson, Morgan M, George, Maureen, Miaskowski, Christine, Smaldone, Arlene, Topaz, Maxim
Format Journal Article
LanguageEnglish
Published United States 01.05.2021
Subjects
Online AccessGet more information

Cover

Loading…
Abstract Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes. We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations. First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types. Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent. Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
AbstractList Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes. We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations. First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types. Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent. Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
Author Bakken, Suzanne
George, Maureen
Koleck, Theresa A
Mitha, Shazia
Miaskowski, Christine
Henderson, Morgan M
Topaz, Maxim
Tatonetti, Nicholas P
Smaldone, Arlene
Author_xml – sequence: 1
  givenname: Theresa A
  surname: Koleck
  fullname: Koleck, Theresa A
– sequence: 2
  givenname: Nicholas P
  surname: Tatonetti
  fullname: Tatonetti, Nicholas P
– sequence: 3
  givenname: Suzanne
  surname: Bakken
  fullname: Bakken, Suzanne
– sequence: 4
  givenname: Shazia
  surname: Mitha
  fullname: Mitha, Shazia
– sequence: 5
  givenname: Morgan M
  surname: Henderson
  fullname: Henderson, Morgan M
– sequence: 6
  givenname: Maureen
  surname: George
  fullname: George, Maureen
– sequence: 7
  givenname: Christine
  surname: Miaskowski
  fullname: Miaskowski, Christine
– sequence: 8
  givenname: Arlene
  surname: Smaldone
  fullname: Smaldone, Arlene
– sequence: 9
  givenname: Maxim
  surname: Topaz
  fullname: Topaz, Maxim
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33196504$$D View this record in MEDLINE/PubMed
BookMark eNpNj81KxDAAhIMo7o--gUheoGv-2iRHKboWShXXPS_ZJimRNilNe-jba1HBuQx8MwzMBlz64A0AdxjtMJL8oared-i_mBAXYI1TKhIpGF-BTYyfC88IvQYrSrHMUsTW4FBo40dnZ-cbeJi7fgwdLLwNQ6dGFzx0Huat865WLazCaCI8xqVbqXEavlmpfDOpxsC3IdQmLtkNuLKqjeb217fg-Pz0kb8k5eu-yB_LpGYC00SSVGpqiTZnwjOrjDYSa04sJUpLUWspteZZKhRiGCEpayyR1ZxpZRWvLdmC-5_dfjp3Rp_6wXVqmE9_78gXRAtTmg
CitedBy_id crossref_primary_10_3389_fpain_2024_1254792
crossref_primary_10_1016_j_ijdrr_2024_104951
crossref_primary_10_1016_j_outlook_2022_04_004
crossref_primary_10_1093_eurjcn_zvad068
crossref_primary_10_1177_10998004221121109
crossref_primary_10_1038_s41598_024_51615_5
crossref_primary_10_1097_NCC_0000000000001287
crossref_primary_10_1177_10775595231194599
crossref_primary_10_1002_cam4_7253
crossref_primary_10_1093_jamiaopen_ooae082
crossref_primary_10_1016_j_ijnurstu_2021_104153
crossref_primary_10_1016_j_identj_2024_06_015
crossref_primary_10_2196_32903
crossref_primary_10_1097_CIN_0000000000000967
crossref_primary_10_1111_jnu_13038
crossref_primary_10_7759_cureus_65792
crossref_primary_10_1200_CCI_23_00235
crossref_primary_10_1016_j_soncn_2023_151428
crossref_primary_10_1016_j_ienj_2023_101272
crossref_primary_10_1093_jamia_ocad079
crossref_primary_10_1002_nur_22190
crossref_primary_10_1177_10547738241292657
crossref_primary_10_1016_j_jamda_2023_09_006
crossref_primary_10_1097_ANS_0000000000000423
crossref_primary_10_1097_NNR_0000000000000586
crossref_primary_10_1016_j_ijmedinf_2024_105544
crossref_primary_10_1136_bmjoq_2023_002295
crossref_primary_10_1038_s41598_024_56324_7
crossref_primary_10_1038_s41746_024_01121_9
ContentType Journal Article
Copyright Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
Copyright_xml – notice: Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1097/NNR.0000000000000488
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Nursing
Social Sciences (General)
EISSN 1538-9847
ExternalDocumentID 33196504
Genre Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R35 GM131905
– fundername: NINR NIH HHS
  grantid: K99 NR017651
– fundername: NINR NIH HHS
  grantid: P30 NR016587
– fundername: NINR NIH HHS
  grantid: R00 NR017651
GroupedDBID ---
-ET
.3C
.GJ
.Z2
0-6
07C
123
186
2FS
2KS
3T~
41~
4Q1
4Q2
4Q3
53G
5RE
5VS
85S
8L-
9V3
AAAAV
AAHPQ
AAIKC
AAIQE
AAJYS
AAMNW
AAMTA
AAQQT
AARTV
AASCR
AAUEB
AAWTL
AAYEP
AAYJJ
ABASU
ABBUW
ABDIG
ABILE
ABIVO
ABJNI
ABNJN
ABOCM
ABPPZ
ABPXF
ABVCZ
ABWJO
ABXVJ
ABZAD
ABZZY
ACAAF
ACDDN
ACDOF
ACEWG
ACEWU
ACGFO
ACGFS
ACHQT
ACIFK
ACILI
ACJBD
ACNCT
ACNWC
ACTAD
ACTHT
ACWDW
ACWRI
ACXJB
ACXNZ
ADBIZ
ADEGP
ADFPA
ADGGA
ADGHP
ADHPY
ADMHC
ADNKB
ADRCX
ADTGS
ADUKH
AE3
AEETU
AENEX
AEQHQ
AFBFQ
AFDTB
AFFNX
AFMBP
AFMFG
AFPHX
AFSOK
AFUWQ
AFYGQ
AGBRE
AGNAY
AHQNM
AHRYX
AHVBC
AHWXW
AIDAL
AIDBO
AINUH
AJCLO
AJEOO
AJIOK
AJNWD
AJNYG
AJZMW
AKCTQ
ALBXT
ALKUP
ALMA_UNASSIGNED_HOLDINGS
ALMTX
AMJPA
AMKUR
AMNEI
AOHHW
AOQMC
ATPOU
BQLVK
BQ~
BS7
BYPQX
C45
CGR
CS3
CUY
CVF
DIWNM
DU5
DUNZO
E.X
EBS
ECM
EEVPB
EIF
EJD
EX3
F2K
F2L
F5P
FCALG
FL-
GH5
GNXGY
GQDEL
H0~
HLJTE
HYJ
HZ~
H~9
IKREB
IN~
IYOWL
J5H
JF9
JG8
JK3
JK8
K8S
KD2
KMI
KOO
L-C
L47
L7B
LK2
MMDCI
MPPUT
MZP
N4W
NEJ
NHB
NPM
N~6
N~M
O9-
OAG
OAH
OBZCC
OCUKA
ODA
OEN
OFFRU
OGKNY
OHCKH
OHT
OKBHI
OL1
OLG
OLL
OLV
OLZ
OMK
ON2
ONSOO
ONV
OPUJH
OPX
ORAPC
OROCO
ORVUJ
OUGNH
OUVQU
OUVZD
OVD
OVDLW
OVDNE
OVOZU
OWU
OWV
OWW
OWX
OWY
OWZ
OXXIT
P-K
P2P
PQQKQ
QMB
QS-
QZG
R58
R77
RLZ
S4R
S4S
T8P
TEORI
TSPGW
UKR
UMD
V2I
VVN
W3M
WAC
WG1
WH7
WOQ
WOW
X3V
X3W
X7L
XXN
XYM
XZL
YFH
YHZ
YOC
YOJ
YQI
YQJ
YR5
YSQ
YXB
YYQ
YZZ
ZCG
ZFV
ZGI
ZT4
ZUP
ZXP
ZZMQN
~G0
ID FETCH-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2
IngestDate Mon Jul 21 05:34:05 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/9109773
PMID 33196504
ParticipantIDs pubmed_primary_33196504
PublicationCentury 2000
PublicationDate 2021-05-01
PublicationDateYYYYMMDD 2021-05-01
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-05-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Nursing research (New York)
PublicationTitleAlternate Nurs Res
PublicationYear 2021
SSID ssj0004623
Score 2.4616482
Snippet Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase...
SourceID pubmed
SourceType Index Database
StartPage 173
SubjectTerms Constipation - diagnosis
Depression - diagnosis
Electronic Health Records - statistics & numerical data
Fatigue - diagnosis
Humans
Natural Language Processing
Pattern Recognition, Automated - methods
Sleep Wake Disorders - diagnosis
Symptom Assessment - nursing
Tachycardia - diagnosis
Vocabulary, Controlled
Title Identifying Symptom Information in Clinical Notes Using Natural Language Processing
URI https://www.ncbi.nlm.nih.gov/pubmed/33196504
Volume 70
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT9wwELUWEBUXVCjlq1Q-9NAKhZLEiddHQCCEaA7dReKGJrGjXdAuK204sP-B_8w4Y5NAKaLdQ7SKpSjKe5l47HlvGPsWp8aERiZBIgwmKHEeBWrfdIO0AJGaXBamqKt8s_T0QpxdJpedzkOraumuyveK2au6kv9BFc8hrlYl-w_IPl0UT-B_xBePiDAe34UxqWxJqdS7H02q29Gu0xf5GsYjr3zMbu0KK1UIZEBuG-dusdLLBfxn7Pq5JtcZAg2ed-5pqrRth10quEYCmCk066N9sF7fFZUMIOlsIj1tJGWHcHNDYa93N4Nxs8P_a1jRRlRvALMhtJcmorApBNwzrXDaJU9NH2-pUYjjVdwKniE1NfkjqJNZcJb9JrNJ_xPUD7CF82RUAx3bsJJQV-O3R19YbfuhOTaHSYftomqXfp5EtlHstZdK_nztdpbYB3-JF1lKPVvpf2TLLs3gB8SZFdYx41W26ABdZZukyuYusk_5d2c__uMT67VYxR2reItVfDjmnlW8ZhWvWcUdq7hnFW9YtcYuTo77R6eBa70RFKIbxoHCrFjHZaRNHsm0BKONCrWMyjgCrbqFVkpribNX2BehzTIKnFiWWgoNJciijD6z-TESbINxsBvFOeikhESoPFGJrm3nQAqhjUg32To9qasJ-atc-We49deRbbbU8O0LWyjxhTY7ODus8q81ao9f9GPG
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+Symptom+Information+in+Clinical+Notes+Using+Natural+Language+Processing&rft.jtitle=Nursing+research+%28New+York%29&rft.au=Koleck%2C+Theresa+A&rft.au=Tatonetti%2C+Nicholas+P&rft.au=Bakken%2C+Suzanne&rft.au=Mitha%2C+Shazia&rft.date=2021-05-01&rft.eissn=1538-9847&rft.volume=70&rft.issue=3&rft.spage=173&rft_id=info:doi/10.1097%2FNNR.0000000000000488&rft_id=info%3Apmid%2F33196504&rft_id=info%3Apmid%2F33196504&rft.externalDocID=33196504