A Study of Biomedical Relation Extraction Using GPT Models

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on e...

Full description

Saved in:

Bibliographic Details
Published in	AMIA Summits on Translational Science proceedings Vol. 2024; p. 391
Main Authors	Zhang, Jeffrey, Wibert, Maxwell, Zhou, Huixue, Peng, Xueqing, Chen, Qingyu, Keloth, Vipina K, Hu, Yan, Zhang, Rui, Xu, Hua, Raja, Kalpana
Format	Journal Article
Language	English
Published	United States 2024
Subjects	GPT-4 Prompt engineering relation extraction GPT-3.5-turbo generative pre-trained transformer
Online Access	Get full text
ISSN	2153-4063 2153-4063

Cover

Abstract	Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
AbstractList	Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same. Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.
Author	Wibert, Maxwell Keloth, Vipina K Zhou, Huixue Xu, Hua Peng, Xueqing Hu, Yan Raja, Kalpana Chen, Qingyu Zhang, Jeffrey Zhang, Rui
Author_xml	– sequence: 1 givenname: Jeffrey surname: Zhang fullname: Zhang, Jeffrey organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 2 givenname: Maxwell surname: Wibert fullname: Wibert, Maxwell organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 3 givenname: Huixue surname: Zhou fullname: Zhou, Huixue organization: Institute for Health Informatics, University of Minnesota, Twin Cities, USA – sequence: 4 givenname: Xueqing surname: Peng fullname: Peng, Xueqing organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 5 givenname: Qingyu surname: Chen fullname: Chen, Qingyu organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 6 givenname: Vipina K surname: Keloth fullname: Keloth, Vipina K organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 7 givenname: Yan surname: Hu fullname: Hu, Yan organization: School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, USA – sequence: 8 givenname: Rui surname: Zhang fullname: Zhang, Rui organization: Department of Surgery, Minneapolis, School of Medicine, University of Minnesota, Minneapolis, USA – sequence: 9 givenname: Hua surname: Xu fullname: Xu, Hua organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA – sequence: 10 givenname: Kalpana surname: Raja fullname: Raja, Kalpana organization: Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/38827097$$D View this record in MEDLINE/PubMed
BookMark	eNpNj89LwzAAhYNM3Jz7FyRHL4U0v-NtjjmFiaL1XNIklUja1KYF99-v6ATf5b3Dx4PvEsza2LozsMA5IxlFnMz-7TlYpfSJplDKFaMXYE6kxAIpsQC3a_g2jPYAYw3vfGyc9UYH-OqCHnxs4fZ76LX5me_Jtx9w91LAp2hdSFfgvNYhudWpl6C43xabh2z_vHvcrPdZx7jIhEFM5ZwKqWWNpbXYVhRhJDjTxBkpuaG54lQZRlRltHJaYY0FlTi3zhiyBDe_t10fv0aXhrLxybgQdOvimEqCOM2JEohO6PUJHavJpOx63-j-UP7pkiMpdFIj
ContentType	Journal Article
Copyright	2024 AMIA - All rights reserved.
Copyright_xml	– notice: 2024 AMIA - All rights reserved.
DBID	NPM 7X8
DatabaseName	PubMed MEDLINE - Academic
DatabaseTitle	PubMed MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2153-4063
ExternalDocumentID	38827097
Genre	Journal Article
GrantInformation_xml	– fundername: NIA NIH HHS grantid: R01 AG078154 – fundername: NLM NIH HHS grantid: T15 LM007056
GroupedDBID	53G ADBBV ADRAZ ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL DIK GX1 HYE KQ8 M48 NPM O5R O5S OK1 RPM 7X8
ID	FETCH-LOGICAL-p567-7c05916478a8f28dd2db4020765a3ec886c419649c539bca9ea92a274821decc3
ISSN	2153-4063
IngestDate	Thu Jul 10 16:41:07 EDT 2025 Wed Jul 23 01:46:39 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Keywords	GPT-4 Prompt engineering relation extraction GPT-3.5-turbo generative pre-trained transformer
Language	English
License	2024 AMIA - All rights reserved.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p567-7c05916478a8f28dd2db4020765a3ec886c419649c539bca9ea92a274821decc3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
PMID	38827097
PQID	3064139704
PQPubID	23479
ParticipantIDs	proquest_miscellaneous_3064139704 pubmed_primary_38827097
PublicationCentury	2000
PublicationDate	2024-00-00 20240101
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 2024-00-00
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	AMIA Summits on Translational Science proceedings
PublicationTitleAlternate	AMIA Jt Summits Transl Sci Proc
PublicationYear	2024
SSID	ssj0000446954
Score	2.243002
Snippet	Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in...
SourceID	proquest pubmed
SourceType	Aggregation Database Index Database
StartPage	391
Title	A Study of Biomedical Relation Extraction Using GPT Models
URI	https://www.ncbi.nlm.nih.gov/pubmed/38827097 https://www.proquest.com/docview/3064139704
Volume	2024
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA-yJ19E8Wt-EcE36ehX2sS3KnNTmPgwYW8lTVMYzG1uKwz_ei9J0w5xoL6U0qShzY_c_e5yl0PoxuWgc1jkOSB_CycULnMySokjSRCxQnJChTIUBy9R_y18HpFRU1FPZ5esso74_DGv5D-owjPAVWXJ_gHZelB4APeAL1wBYbj-CuNEhwHqPfJ7nUavZ9zGt91216tFVQrcRAb0Xoe6-NlkuclJk8ETDASfPTZ7B1p9TayT0C7-RtPVLLx2NlfpYI0PJ6szgdbKOdi8MCu1rivH61I2UtmMMirlh1WklR_CJD53pJZUQBsCMEQrSVWJ1bqPkYyBKcq1gcr8XcMSAMePXROk--3oa9sEGhZ4lArY64282oOmtqIZUYWVbL_tVoJmC8N9tFfRfJwYzA7QjpweorsEa7zwrMANXtjihRu8sMYLA17Y4HWEho_d4UPfqWpXOHMCqicWQFvVUW2U08Knee7nmbLU44jwQApKIxGqo9CYIAHLBGeSM5_7cUh9L4dVFRyj1nQ2lacIs4JR4UrKXSLCKMiYJzKQmzknLOQ8Em10bX86BdGg9nv4VM7KZaqMS0Xw3bCNTsxspHNzhklqp-xsa8s52lUQGnfTBWqtFqW8BAK2yq40Dl97UzT_
linkProvider	Geneva Foundation for Medical Education and Research
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Study+of+Biomedical+Relation+Extraction+Using+GPT+Models&rft.jtitle=AMIA+Summits+on+Translational+Science+proceedings&rft.au=Zhang%2C+Jeffrey&rft.au=Wibert%2C+Maxwell&rft.au=Zhou%2C+Huixue&rft.au=Peng%2C+Xueqing&rft.date=2024&rft.eissn=2153-4063&rft.volume=2024&rft.spage=391&rft_id=info%3Apmid%2F38827097&rft.externalDocID=38827097
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2153-4063&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2153-4063&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2153-4063&client=summon