Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer
Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice. Despite their transformative impact on natural language processing (NLP), public LLMs present notable vulnerabilities given the source of traini...
Saved in:
Published in | AMIA ... Annual Symposium proceedings Vol. 2024; p. 339 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
United States
2024
|
Subjects | |
Online Access | Get full text |
ISSN | 1942-597X 1559-4076 |
Cover
Loading…
Abstract | Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice. Despite their transformative impact on natural language processing (NLP), public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain. |
---|---|
AbstractList | Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice. Despite their transformative impact on natural language processing (NLP), public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain. Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice. Despite their transformative impact on natural language processing (NLP), public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain.Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice. Despite their transformative impact on natural language processing (NLP), public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain. |
Author | Banerjee, Imon Dhara, Boddhisattwa Tariq, Amara Batalini, Felipe Das, Avisha |
Author_xml | – sequence: 1 givenname: Avisha surname: Das fullname: Das, Avisha organization: Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona – sequence: 2 givenname: Amara surname: Tariq fullname: Tariq, Amara organization: Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona – sequence: 3 givenname: Felipe surname: Batalini fullname: Batalini, Felipe organization: Department of Oncology, Mayo Clinic Arizona – sequence: 4 givenname: Boddhisattwa surname: Dhara fullname: Dhara, Boddhisattwa organization: BITS Pilani (Hyderabad), India – sequence: 5 givenname: Imon surname: Banerjee fullname: Banerjee, Imon organization: School of Computing and Augmented Intelligence, Arizona State University |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40417494$$D View this record in MEDLINE/PubMed |
BookMark | eNo1kF9LwzAUxYNM3B_9CpJHXwppmzStb7NuKlQUHOJbuU1vt2iX1iQF9-3tcD7dy-X8LuecOZmYzuAZmYVCZAFnMpmMe8ajQGTyY0rmzn0yxqVIkwsy5YyHkmd8RnD103dOmy19H1qDFirdaq_RUW1o3mqjFbS0KJ4d3exsN2x39B480NdOu84cuaX3oL7cLc3BIX3zQ304sncWwfnxaBTaS3LeQOvw6jQXZLNebfLHoHh5eMqXRdCHUepH2yoVNU8xbBoGwCVntQqjSCUVAjQKkkpKFjdMySSKUTYNpIKLqBYAaZzFC3Lz97a33feAzpd77RS2LRjsBlfGERv5MTwbpdcn6VDtsS57q_dgD-V_M_EvSaFitg |
ContentType | Journal Article |
Copyright | 2024 AMIA - All rights reserved. |
Copyright_xml | – notice: 2024 AMIA - All rights reserved. |
DBID | CGR CUY CVF ECM EIF NPM 7X8 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 1559-4076 |
ExternalDocumentID | 40417494 |
Genre | Journal Article |
GroupedDBID | 2WC 53G ADBBV ALMA_UNASSIGNED_HOLDINGS BAWUL CGR CUY CVF DIK E3Z ECM EIF GX1 HYE NPM OK1 RPM WOQ 7X8 |
ID | FETCH-LOGICAL-p128t-40c85d48e1ff0aa4740dc122c6beaafca6b7703f0c7623e7ffa85452d5aa8393 |
ISSN | 1942-597X |
IngestDate | Mon May 26 17:04:48 EDT 2025 Wed Jun 04 01:40:05 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | 2024 AMIA - All rights reserved. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p128t-40c85d48e1ff0aa4740dc122c6beaafca6b7703f0c7623e7ffa85452d5aa8393 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PMID | 40417494 |
PQID | 3207704040 |
PQPubID | 23479 |
ParticipantIDs | proquest_miscellaneous_3207704040 pubmed_primary_40417494 |
PublicationCentury | 2000 |
PublicationDate | 2024-00-00 20240101 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 2024-00-00 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | AMIA ... Annual Symposium proceedings |
PublicationTitleAlternate | AMIA Annu Symp Proc |
PublicationYear | 2024 |
References | 38562849 - medRxiv. 2024 Mar 21:2024.03.20.24304627. doi: 10.1101/2024.03.20.24304627. |
References_xml | – reference: 38562849 - medRxiv. 2024 Mar 21:2024.03.20.24304627. doi: 10.1101/2024.03.20.24304627. |
SSID | ssj0047586 |
Score | 2.322088 |
Snippet | Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the current standard practice.... |
SourceID | proquest pubmed |
SourceType | Aggregation Database Index Database |
StartPage | 339 |
SubjectTerms | Breast Neoplasms Computer Security Female Humans Natural Language Processing |
Title | Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40417494 https://www.proquest.com/docview/3207704040 |
Volume | 2024 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1ba9swFIBF6cPYy1jXXdqtRYO9mRgnlnzZW5q2tKMpg2Ujb0G2JGZonCxVusuv7zm6OGGssO3FGNlJjD756JyTcyHknSi1kmkhe7yf6x7LqxLeuVrBmeZ9kXHNJboGxtfZxWf2YcqnocW9zy4xVVz_-mNeyf9QhTHgilmy_0C2-1IYgHPgC0cgDMe_Ynz2A2OuwNb_sr7B6tE20LWxIVbRKKQ8Xl2Nb6OJb8dzKoyIPi4whMj6Q4zBHHt0C4xgO7NRhTYR8ARj1Q0MwppYbSuww_HlMIrjOPKF-T_9nOMzrOfRZivc6lXveh3fYYvnjY9g1Xyzw3Ox2rgD0I3U2P5S0bm6aZbdgjvFktJ2GS6k_IrRR-a72HZWuOzoWHnRykuwVl23lyB7u3uc-ExdYaMtdMu5ZccSBqZTyTa7VhdLGC5hAQOQ75jvPe2ifBiYQ9ilKtz0sB1h9YnJU_LEGwJ06KjukR3VPiOPxj7UYZ-oAJf-Bpc2LQ1wKcKlHi5FuLSDSz3c9xTRUosWP-vQUof2OZmcn01GFz3fE6O3BE3CwATWBZesUH2tEyFYzhJZ9weDOquUELoWWZWDENdJDbtcqnKtRYFt5CUXAnTh9AXZbRetekVoXmihWSFLXpYs06VQLM1Qn1HApJLFAXkbpmoGIgf_RxKtWqxvZ-kggd-AKU0OyEs3h7Olq40yCxN9-OCV1-QxUndurDdk16zW6ggUO1MdW3T35YBUWg |
linkProvider | Geneva Foundation for Medical Education and Research |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exposing+Vulnerabilities+in+Clinical+LLMs+Through+Data+Poisoning+Attacks%3A+Case+Study+in+Breast+Cancer&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Das%2C+Avisha&rft.au=Tariq%2C+Amara&rft.au=Batalini%2C+Felipe&rft.au=Dhara%2C+Boddhisattwa&rft.date=2024&rft.eissn=1559-4076&rft.volume=2024&rft.spage=339&rft_id=info%3Apmid%2F40417494&rft.externalDocID=40417494 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon |