Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study
Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despit...
Saved in:
Published in | JMIR AI Vol. 2; p. e44191 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Canada
JMIR Publications
12.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.
Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).
A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.
The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.
We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients. |
---|---|
AbstractList | Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.
Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).
A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.
The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.
We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients. BackgroundAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. ObjectiveOur aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). MethodsA rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier’s hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. ResultsThe AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. ConclusionsWe developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients. Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.BACKGROUNDAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).OBJECTIVEOur aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.METHODSA rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.RESULTSThe AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.CONCLUSIONSWe developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients. |
Author | Bielinski, Suzette J Pongdee, Thanai Larson, Nicholas B Divekar, Rohit Moon, Sungrim Liu, Hongfang |
Author_xml | – sequence: 1 givenname: Thanai orcidid: 0000-0002-4725-242X surname: Pongdee fullname: Pongdee, Thanai – sequence: 2 givenname: Nicholas B orcidid: 0000-0002-3468-4215 surname: Larson fullname: Larson, Nicholas B – sequence: 3 givenname: Rohit orcidid: 0000-0003-3836-1322 surname: Divekar fullname: Divekar, Rohit – sequence: 4 givenname: Suzette J orcidid: 0000-0002-2905-5430 surname: Bielinski fullname: Bielinski, Suzette J – sequence: 5 givenname: Hongfang orcidid: 0000-0003-2570-3741 surname: Liu fullname: Liu, Hongfang – sequence: 6 givenname: Sungrim orcidid: 0000-0002-9191-3897 surname: Moon fullname: Moon, Sungrim |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39105270$$D View this record in MEDLINE/PubMed |
BookMark | eNpNkttu1DAQhiNUREvZV0C-QeIm4ENO5m7VLnSl5SCg19HYnqSuEnuxnar7Qjxn02ypuJrRP5_-Gc3M6-zEeYdZtmL0A2ey-lgUTLIX2RlvWJ2zmpYn_-Wn2SrGW0opl5xxSV9lp0IyWvKanmV_11PyIyQ0ZGvQJdtZDcl6R3xH1nFvg3X55h40BrVQP_FRhOTDgVzaiBCRXEfrevIN0hRgIDtw_QQ9kh_Ba4xLDZwhX0HfWIdkhxDcLH4i66H3waabkVziHQ5-P84TLOzmDobpOMevNJnDm-xlB0PE1VM8z64_b35fXOW771-2F-tdrkVBU94JIyWvKFbCdJRjBwWrC9bN21FlLVgFkmGjhNCoVMMbrCtmOqVQC-wqqsV5tj36Gg-37T7YEcKh9WDbRfChbyEkqwdsGQhTVlAUppSFVtAIwxspGqZFSZWC2ev90Wsf_J8JY2pHGzUOAzj0U2wFbWTJ5tM1M_r2CZ3UiOa58b87zcC7I6CDjzFg94ww2j6-QLu8gHgA9D-kVA |
Cites_doi | 10.1034/j.1399-3003.2000.016003432.x 10.1016/s1081-1206(10)62084-4 10.1016/j.jaci.2021.06.018 10.1016/j.ijmedinf.2019.05.008 10.1016/j.ijmedinf.2017.12.024 10.1016/j.jaci.2020.10.043 10.1016/j.jaip.2015.01.007 10.1038/s41746-019-0208-8 10.1016/j.jaci.2021.06.019 10.2196/29015 10.1016/j.jaci.2016.05.048 10.1016/j.jaip.2016.03.004 10.1016/j.jaip.2022.01.047 |
ContentType | Journal Article |
Copyright | Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023. |
Copyright_xml | – notice: Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023. |
DBID | AAYXX CITATION NPM 7X8 DOA |
DOI | 10.2196/44191 |
DatabaseName | CrossRef PubMed MEDLINE - Academic DOAJ Open Access Full Text |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2817-1705 |
ExternalDocumentID | oai_doaj_org_article_1a3d56a44d594cba83d289381c350bba 39105270 10_2196_44191 |
Genre | Journal Article |
GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E PGMZT RPM ABDBF NPM 7X8 |
ID | FETCH-LOGICAL-c340t-f3d99260e63df02efa41741f191b57316a91e8b33cebb828e761dfbbec3ef60c3 |
IEDL.DBID | DOA |
ISSN | 2817-1705 |
IngestDate | Wed Aug 27 01:30:16 EDT 2025 Tue Aug 05 10:09:10 EDT 2025 Mon Jul 21 06:05:18 EDT 2025 Tue Jul 01 03:37:58 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | natural language processing algorithm identification natural language processing respiratory illness aspirin aspirin exacerbated respiratory disease asthma machine learning electronic health record artificial intelligence |
Language | English |
License | Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon. Originally published in JMIR AI (https://ai.jmir.org), 12.06.2023. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c340t-f3d99260e63df02efa41741f191b57316a91e8b33cebb828e761dfbbec3ef60c3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0003-3836-1322 0000-0002-4725-242X 0000-0003-2570-3741 0000-0002-9191-3897 0000-0002-3468-4215 0000-0002-2905-5430 |
OpenAccessLink | https://doaj.org/article/1a3d56a44d594cba83d289381c350bba |
PMID | 39105270 |
PQID | 3089514418 |
PQPubID | 23479 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_1a3d56a44d594cba83d289381c350bba proquest_miscellaneous_3089514418 pubmed_primary_39105270 crossref_primary_10_2196_44191 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-Jun-12 |
PublicationDateYYYYMMDD | 2023-06-12 |
PublicationDate_xml | – month: 06 year: 2023 text: 2023-Jun-12 day: 12 |
PublicationDecade | 2020 |
PublicationPlace | Canada |
PublicationPlace_xml | – name: Canada |
PublicationTitle | JMIR AI |
PublicationTitleAlternate | JMIR AI |
PublicationYear | 2023 |
Publisher | JMIR Publications |
Publisher_xml | – name: JMIR Publications |
References | ref12 Breiman, L (ref13) 1984 ref15 ref14 ref11 ref10 ref2 ref1 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
References_xml | – ident: ref2 doi: 10.1034/j.1399-3003.2000.016003432.x – ident: ref4 doi: 10.1016/s1081-1206(10)62084-4 – year: 1984 ident: ref13 publication-title: Classification and Regression Trees (1st Edition) – ident: ref1 doi: 10.1016/j.jaci.2021.06.018 – ident: ref11 doi: 10.1016/j.ijmedinf.2019.05.008 – ident: ref10 doi: 10.1016/j.ijmedinf.2017.12.024 – ident: ref5 doi: 10.1016/j.jaci.2020.10.043 – ident: ref6 doi: 10.1016/j.jaip.2015.01.007 – ident: ref9 doi: 10.1038/s41746-019-0208-8 – ident: ref7 doi: 10.1016/j.jaci.2021.06.019 – ident: ref14 doi: 10.2196/29015 – ident: ref3 doi: 10.1016/j.jaci.2016.05.048 – ident: ref15 doi: 10.1016/j.jaip.2016.03.004 – ident: ref8 doi: 10.1016/j.jaip.2022.01.047 – ident: ref12 |
SSID | ssj0002921290 |
Score | 2.2440014 |
Snippet | Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal... BackgroundAspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis... |
SourceID | doaj proquest pubmed crossref |
SourceType | Open Website Aggregation Database Index Database |
StartPage | e44191 |
Title | Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study |
URI | https://www.ncbi.nlm.nih.gov/pubmed/39105270 https://www.proquest.com/docview/3089514418 https://doaj.org/article/1a3d56a44d594cba83d289381c350bba |
Volume | 2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JSwMxFA7SgwjivtSlRPAaOjPJbN6qthSxPYiF3oastaAzUqegf8jf6UtmWutBvHidBBLyvcz73stbELoUoLaEzwWxiTKE2TgqHhhDkoQHkkqthLL5zoNh1B-xu3E4Xmn1ZWPCqvLA1cG1fU5VGHHGVJgyKXhCFdgIoGckDT0hHDUCnbdiTNl_cJAG1sGyjjZtrDNIWRvUfur_UD6uRv_vxNIpmN4O2qqZIe5UO9pFazrfQ9uLrgu4voT76LMzLwvgmVrhKsvW1G43XBjcsQ_n05x037mEA3OzHr5f0_Ft9R6DXaQAHnJXdQPf105LXKcN2DGeKzxwkZYa10VYJ1e48zwpZtPy6QWvRBu5ud1l2XBsYxM_DtCo13286ZO62wKRlHklMVSlKVg3OqLKeIE2nIG14hs4NhHa_lY89XUiKAAoBNhpOo58ZQTIANUm8iQ9RI28yPUxwlqCDmQMhMPXLGIxF4oChjAp5kDpkiZqLWDIXquiGhkYIxanzOHURNcWnOWgrYHtPoBkZLVkZH9JRhNdLKDN4M7YhxCe62L-llEvAWIJK8FOjirMl0tR4E9hEHsn_7GFU7Rh29MT1-voDDXK2VyfA4kpRcvJa8t5l74A-VP1Cg |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automated+Identification+of+Aspirin-Exacerbated+Respiratory+Disease+Using+Natural+Language+Processing+and+Machine+Learning%3A+Algorithm+Development+and+Evaluation+Study&rft.jtitle=JMIR+AI&rft.au=Pongdee%2C+Thanai&rft.au=Larson%2C+Nicholas+B&rft.au=Divekar%2C+Rohit&rft.au=Bielinski%2C+Suzette+J&rft.date=2023-06-12&rft.issn=2817-1705&rft.eissn=2817-1705&rft.volume=2&rft.spage=e44191&rft_id=info:doi/10.2196%2F44191&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2817-1705&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2817-1705&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2817-1705&client=summon |