Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study

Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despit...

Full description

Saved in:
Bibliographic Details
Published inJMIR AI Vol. 2; p. e44191
Main Authors Pongdee, Thanai, Larson, Nicholas B, Divekar, Rohit, Bielinski, Suzette J, Liu, Hongfang, Moon, Sungrim
Format Journal Article
LanguageEnglish
Published Canada JMIR Publications 12.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.
AbstractList Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.
BackgroundAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. ObjectiveOur aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). MethodsA rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier’s hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. ResultsThe AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. ConclusionsWe developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.
Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.BACKGROUNDAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).OBJECTIVEOur aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.METHODSA rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.RESULTSThe AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.CONCLUSIONSWe developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.
Author Bielinski, Suzette J
Pongdee, Thanai
Larson, Nicholas B
Divekar, Rohit
Moon, Sungrim
Liu, Hongfang
Author_xml – sequence: 1
  givenname: Thanai
  orcidid: 0000-0002-4725-242X
  surname: Pongdee
  fullname: Pongdee, Thanai
– sequence: 2
  givenname: Nicholas B
  orcidid: 0000-0002-3468-4215
  surname: Larson
  fullname: Larson, Nicholas B
– sequence: 3
  givenname: Rohit
  orcidid: 0000-0003-3836-1322
  surname: Divekar
  fullname: Divekar, Rohit
– sequence: 4
  givenname: Suzette J
  orcidid: 0000-0002-2905-5430
  surname: Bielinski
  fullname: Bielinski, Suzette J
– sequence: 5
  givenname: Hongfang
  orcidid: 0000-0003-2570-3741
  surname: Liu
  fullname: Liu, Hongfang
– sequence: 6
  givenname: Sungrim
  orcidid: 0000-0002-9191-3897
  surname: Moon
  fullname: Moon, Sungrim
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39105270$$D View this record in MEDLINE/PubMed
BookMark eNpNkttu1DAQhiNUREvZV0C-QeIm4ENO5m7VLnSl5SCg19HYnqSuEnuxnar7Qjxn02ypuJrRP5_-Gc3M6-zEeYdZtmL0A2ey-lgUTLIX2RlvWJ2zmpYn_-Wn2SrGW0opl5xxSV9lp0IyWvKanmV_11PyIyQ0ZGvQJdtZDcl6R3xH1nFvg3X55h40BrVQP_FRhOTDgVzaiBCRXEfrevIN0hRgIDtw_QQ9kh_Ba4xLDZwhX0HfWIdkhxDcLH4i66H3waabkVziHQ5-P84TLOzmDobpOMevNJnDm-xlB0PE1VM8z64_b35fXOW771-2F-tdrkVBU94JIyWvKFbCdJRjBwWrC9bN21FlLVgFkmGjhNCoVMMbrCtmOqVQC-wqqsV5tj36Gg-37T7YEcKh9WDbRfChbyEkqwdsGQhTVlAUppSFVtAIwxspGqZFSZWC2ev90Wsf_J8JY2pHGzUOAzj0U2wFbWTJ5tM1M_r2CZ3UiOa58b87zcC7I6CDjzFg94ww2j6-QLu8gHgA9D-kVA
Cites_doi 10.1034/j.1399-3003.2000.016003432.x
10.1016/s1081-1206(10)62084-4
10.1016/j.jaci.2021.06.018
10.1016/j.ijmedinf.2019.05.008
10.1016/j.ijmedinf.2017.12.024
10.1016/j.jaci.2020.10.043
10.1016/j.jaip.2015.01.007
10.1038/s41746-019-0208-8
10.1016/j.jaci.2021.06.019
10.2196/29015
10.1016/j.jaci.2016.05.048
10.1016/j.jaip.2016.03.004
10.1016/j.jaip.2022.01.047
ContentType Journal Article
Copyright Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023.
Copyright_xml – notice: Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023.
DBID AAYXX
CITATION
NPM
7X8
DOA
DOI 10.2196/44191
DatabaseName CrossRef
PubMed
MEDLINE - Academic
DOAJ Open Access Full Text
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList PubMed

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2817-1705
ExternalDocumentID oai_doaj_org_article_1a3d56a44d594cba83d289381c350bba
39105270
10_2196_44191
Genre Journal Article
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
M~E
PGMZT
RPM
ABDBF
NPM
7X8
ID FETCH-LOGICAL-c340t-f3d99260e63df02efa41741f191b57316a91e8b33cebb828e761dfbbec3ef60c3
IEDL.DBID DOA
ISSN 2817-1705
IngestDate Wed Aug 27 01:30:16 EDT 2025
Tue Aug 05 10:09:10 EDT 2025
Mon Jul 21 06:05:18 EDT 2025
Tue Jul 01 03:37:58 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords natural language processing algorithm
identification
natural language processing
respiratory illness
aspirin
aspirin exacerbated respiratory disease
asthma
machine learning
electronic health record
artificial intelligence
Language English
License Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c340t-f3d99260e63df02efa41741f191b57316a91e8b33cebb828e761dfbbec3ef60c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-3836-1322
0000-0002-4725-242X
0000-0003-2570-3741
0000-0002-9191-3897
0000-0002-3468-4215
0000-0002-2905-5430
OpenAccessLink https://doaj.org/article/1a3d56a44d594cba83d289381c350bba
PMID 39105270
PQID 3089514418
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_1a3d56a44d594cba83d289381c350bba
proquest_miscellaneous_3089514418
pubmed_primary_39105270
crossref_primary_10_2196_44191
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-Jun-12
PublicationDateYYYYMMDD 2023-06-12
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-Jun-12
  day: 12
PublicationDecade 2020
PublicationPlace Canada
PublicationPlace_xml – name: Canada
PublicationTitle JMIR AI
PublicationTitleAlternate JMIR AI
PublicationYear 2023
Publisher JMIR Publications
Publisher_xml – name: JMIR Publications
References ref12
Breiman, L (ref13) 1984
ref15
ref14
ref11
ref10
ref2
ref1
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref2
  doi: 10.1034/j.1399-3003.2000.016003432.x
– ident: ref4
  doi: 10.1016/s1081-1206(10)62084-4
– year: 1984
  ident: ref13
  publication-title: Classification and Regression Trees (1st Edition)
– ident: ref1
  doi: 10.1016/j.jaci.2021.06.018
– ident: ref11
  doi: 10.1016/j.ijmedinf.2019.05.008
– ident: ref10
  doi: 10.1016/j.ijmedinf.2017.12.024
– ident: ref5
  doi: 10.1016/j.jaci.2020.10.043
– ident: ref6
  doi: 10.1016/j.jaip.2015.01.007
– ident: ref9
  doi: 10.1038/s41746-019-0208-8
– ident: ref7
  doi: 10.1016/j.jaci.2021.06.019
– ident: ref14
  doi: 10.2196/29015
– ident: ref3
  doi: 10.1016/j.jaci.2016.05.048
– ident: ref15
  doi: 10.1016/j.jaip.2016.03.004
– ident: ref8
  doi: 10.1016/j.jaip.2022.01.047
– ident: ref12
SSID ssj0002921290
Score 2.2440014
Snippet Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal...
BackgroundAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis...
SourceID doaj
proquest
pubmed
crossref
SourceType Open Website
Aggregation Database
Index Database
StartPage e44191
Title Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study
URI https://www.ncbi.nlm.nih.gov/pubmed/39105270
https://www.proquest.com/docview/3089514418
https://doaj.org/article/1a3d56a44d594cba83d289381c350bba
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JSwMxFA7SgwjivtSlRPAaOjPJbN6qthSxPYiF3oastaAzUqegf8jf6UtmWutBvHidBBLyvcz73stbELoUoLaEzwWxiTKE2TgqHhhDkoQHkkqthLL5zoNh1B-xu3E4Xmn1ZWPCqvLA1cG1fU5VGHHGVJgyKXhCFdgIoGckDT0hHDUCnbdiTNl_cJAG1sGyjjZtrDNIWRvUfur_UD6uRv_vxNIpmN4O2qqZIe5UO9pFazrfQ9uLrgu4voT76LMzLwvgmVrhKsvW1G43XBjcsQ_n05x037mEA3OzHr5f0_Ft9R6DXaQAHnJXdQPf105LXKcN2DGeKzxwkZYa10VYJ1e48zwpZtPy6QWvRBu5ud1l2XBsYxM_DtCo13286ZO62wKRlHklMVSlKVg3OqLKeIE2nIG14hs4NhHa_lY89XUiKAAoBNhpOo58ZQTIANUm8iQ9RI28yPUxwlqCDmQMhMPXLGIxF4oChjAp5kDpkiZqLWDIXquiGhkYIxanzOHURNcWnOWgrYHtPoBkZLVkZH9JRhNdLKDN4M7YhxCe62L-llEvAWIJK8FOjirMl0tR4E9hEHsn_7GFU7Rh29MT1-voDDXK2VyfA4kpRcvJa8t5l74A-VP1Cg
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automated+Identification+of+Aspirin-Exacerbated+Respiratory+Disease+Using+Natural+Language+Processing+and+Machine+Learning%3A+Algorithm+Development+and+Evaluation+Study&rft.jtitle=JMIR+AI&rft.au=Pongdee%2C+Thanai&rft.au=Larson%2C+Nicholas+B&rft.au=Divekar%2C+Rohit&rft.au=Bielinski%2C+Suzette+J&rft.date=2023-06-12&rft.issn=2817-1705&rft.eissn=2817-1705&rft.volume=2&rft.spage=e44191&rft_id=info:doi/10.2196%2F44191&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2817-1705&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2817-1705&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2817-1705&client=summon