A Study of Biomedical Relation Extraction Using GPT Models

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on e...

Full description

Saved in:
Bibliographic Details
Published inAMIA Summits on Translational Science proceedings Vol. 2024; p. 391
Main Authors Zhang, Jeffrey, Wibert, Maxwell, Zhou, Huixue, Peng, Xueqing, Chen, Qingyu, Keloth, Vipina K, Hu, Yan, Zhang, Rui, Xu, Hua, Raja, Kalpana
Format Journal Article
LanguageEnglish
Published United States 2024
Subjects
Online AccessGet full text
ISSN2153-4063
2153-4063

Cover

Abstract Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
AbstractList Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
Author Wibert, Maxwell
Keloth, Vipina K
Zhou, Huixue
Xu, Hua
Peng, Xueqing
Hu, Yan
Raja, Kalpana
Chen, Qingyu
Zhang, Jeffrey
Zhang, Rui
Author_xml – sequence: 1
  givenname: Jeffrey
  surname: Zhang
  fullname: Zhang, Jeffrey
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 2
  givenname: Maxwell
  surname: Wibert
  fullname: Wibert, Maxwell
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 3
  givenname: Huixue
  surname: Zhou
  fullname: Zhou, Huixue
  organization: Institute for Health Informatics, University of Minnesota, Twin Cities, USA
– sequence: 4
  givenname: Xueqing
  surname: Peng
  fullname: Peng, Xueqing
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 5
  givenname: Qingyu
  surname: Chen
  fullname: Chen, Qingyu
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 6
  givenname: Vipina K
  surname: Keloth
  fullname: Keloth, Vipina K
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 7
  givenname: Yan
  surname: Hu
  fullname: Hu, Yan
  organization: School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, USA
– sequence: 8
  givenname: Rui
  surname: Zhang
  fullname: Zhang, Rui
  organization: Department of Surgery, Minneapolis, School of Medicine, University of Minnesota, Minneapolis, USA
– sequence: 9
  givenname: Hua
  surname: Xu
  fullname: Xu, Hua
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
– sequence: 10
  givenname: Kalpana
  surname: Raja
  fullname: Raja, Kalpana
  organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38827097$$D View this record in MEDLINE/PubMed
BookMark eNpNj89LwzAAhYNM3Jz7FyRHL4U0v-NtjjmFiaL1XNIklUja1KYF99-v6ATf5b3Dx4PvEsza2LozsMA5IxlFnMz-7TlYpfSJplDKFaMXYE6kxAIpsQC3a_g2jPYAYw3vfGyc9UYH-OqCHnxs4fZ76LX5me_Jtx9w91LAp2hdSFfgvNYhudWpl6C43xabh2z_vHvcrPdZx7jIhEFM5ZwKqWWNpbXYVhRhJDjTxBkpuaG54lQZRlRltHJaYY0FlTi3zhiyBDe_t10fv0aXhrLxybgQdOvimEqCOM2JEohO6PUJHavJpOx63-j-UP7pkiMpdFIj
ContentType Journal Article
Copyright 2024 AMIA - All rights reserved.
Copyright_xml – notice: 2024 AMIA - All rights reserved.
DBID NPM
7X8
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
EISSN 2153-4063
ExternalDocumentID 38827097
Genre Journal Article
GrantInformation_xml – fundername: NIA NIH HHS
  grantid: R01 AG078154
– fundername: NLM NIH HHS
  grantid: T15 LM007056
GroupedDBID 53G
ADBBV
ADRAZ
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
DIK
GX1
HYE
KQ8
M48
NPM
O5R
O5S
OK1
RPM
7X8
ID FETCH-LOGICAL-p567-7c05916478a8f28dd2db4020765a3ec886c419649c539bca9ea92a274821decc3
ISSN 2153-4063
IngestDate Thu Jul 10 16:41:07 EDT 2025
Wed Jul 23 01:46:39 EDT 2025
IsPeerReviewed false
IsScholarly true
Keywords GPT-4
Prompt engineering
relation extraction
GPT-3.5-turbo
generative pre-trained transformer
Language English
License 2024 AMIA - All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p567-7c05916478a8f28dd2db4020765a3ec886c419649c539bca9ea92a274821decc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 38827097
PQID 3064139704
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3064139704
pubmed_primary_38827097
PublicationCentury 2000
PublicationDate 2024-00-00
20240101
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 2024-00-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle AMIA Summits on Translational Science proceedings
PublicationTitleAlternate AMIA Jt Summits Transl Sci Proc
PublicationYear 2024
SSID ssj0000446954
Score 2.243002
Snippet Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 391
Title A Study of Biomedical Relation Extraction Using GPT Models
URI https://www.ncbi.nlm.nih.gov/pubmed/38827097
https://www.proquest.com/docview/3064139704
Volume 2024
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA-yJ19E8Wt-EcE36ehX2sS3KnNTmPgwYW8lTVMYzG1uKwz_ei9J0w5xoL6U0qShzY_c_e5yl0PoxuWgc1jkOSB_CycULnMySokjSRCxQnJChTIUBy9R_y18HpFRU1FPZ5esso74_DGv5D-owjPAVWXJ_gHZelB4APeAL1wBYbj-CuNEhwHqPfJ7nUavZ9zGt91216tFVQrcRAb0Xoe6-NlkuclJk8ETDASfPTZ7B1p9TayT0C7-RtPVLLx2NlfpYI0PJ6szgdbKOdi8MCu1rivH61I2UtmMMirlh1WklR_CJD53pJZUQBsCMEQrSVWJ1bqPkYyBKcq1gcr8XcMSAMePXROk--3oa9sEGhZ4lArY64282oOmtqIZUYWVbL_tVoJmC8N9tFfRfJwYzA7QjpweorsEa7zwrMANXtjihRu8sMYLA17Y4HWEho_d4UPfqWpXOHMCqicWQFvVUW2U08Knee7nmbLU44jwQApKIxGqo9CYIAHLBGeSM5_7cUh9L4dVFRyj1nQ2lacIs4JR4UrKXSLCKMiYJzKQmzknLOQ8Em10bX86BdGg9nv4VM7KZaqMS0Xw3bCNTsxspHNzhklqp-xsa8s52lUQGnfTBWqtFqW8BAK2yq40Dl97UzT_
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Study+of+Biomedical+Relation+Extraction+Using+GPT+Models&rft.jtitle=AMIA+Summits+on+Translational+Science+proceedings&rft.au=Zhang%2C+Jeffrey&rft.au=Wibert%2C+Maxwell&rft.au=Zhou%2C+Huixue&rft.au=Peng%2C+Xueqing&rft.date=2024&rft.eissn=2153-4063&rft.volume=2024&rft.spage=391&rft_id=info%3Apmid%2F38827097&rft.externalDocID=38827097
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2153-4063&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2153-4063&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2153-4063&client=summon