BioRED: a rich biomedical relation extraction dataset

Abstract Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein inte...

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 23; no. 5
Main Authors Luo, Ling, Lai, Po-Ting, Wei, Chih-Hsuan, Arighi, Cecilia N, Lu, Zhiyong
Format Journal Article
LanguageEnglish
Published England Oxford University Press 20.09.2022
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
AbstractList Abstract Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task ( F -score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations ( F -score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/ .
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.
Author Wei, Chih-Hsuan
Luo, Ling
Lu, Zhiyong
Arighi, Cecilia N
Lai, Po-Ting
Author_xml – sequence: 1
  givenname: Ling
  orcidid: 0000-0002-5141-0259
  surname: Luo
  fullname: Luo, Ling
  email: ling.luo@nih.gov
– sequence: 2
  givenname: Po-Ting
  surname: Lai
  fullname: Lai, Po-Ting
  email: po-ting.lai@nih.gov
– sequence: 3
  givenname: Chih-Hsuan
  surname: Wei
  fullname: Wei, Chih-Hsuan
  email: chih-hsuan.wei@nih.gov
– sequence: 4
  givenname: Cecilia N
  surname: Arighi
  fullname: Arighi, Cecilia N
  email: arighi@udel.edu
– sequence: 5
  givenname: Zhiyong
  surname: Lu
  fullname: Lu, Zhiyong
  email: zhiyong.lu@nih.gov
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35849818$$D View this record in MEDLINE/PubMed
BookMark eNp9kd1LwzAUxYMozk2ffJeCIILUJW2aND4IOucHDATR55Ckqcvompm0ov-92YdDB_qUG_K7J-fe0wXbta01AIcIniPI0r40si-lUEmebIE9hCmNMczw9rwmNM4wSTug6_0EwgTSHO2CTprlmOUo3wPZtbFPw5uLSETOqHEkjZ3qwihRRU5XojG2jvRH44RalIVohNfNPtgpReX1wersgZfb4fPgPh493j0Mrkaxwhg2MZNYIIaTlKWyTEpchGtOJC2Y0DD8rzOMCJGKKVwUaVGorERUa4IoyillLO2By6XurJXBltJ1cFLxmTNT4T65FYb_fqnNmL_ad85wEIBJEDhdCTj71mrf8KnxSleVqLVtPU8IQzQnjMGAHm-gE9u6OozHE4poSmiG5oJHPx2trXxvNABnS0A5673T5RpBkM_z4iEvvsor0GiDVqZZbD2MY6o_ek6WPbad_Sv-BZAspdE
CitedBy_id crossref_primary_10_1016_j_jbi_2024_104716
crossref_primary_10_1093_bioinformatics_btae474
crossref_primary_10_1016_j_jbi_2023_104431
crossref_primary_10_1093_bioinformatics_btad301
crossref_primary_10_3390_electronics14020328
crossref_primary_10_3390_math11020354
crossref_primary_10_1093_database_baae079
crossref_primary_10_1145_3592601
crossref_primary_10_1093_database_baae039
crossref_primary_10_1016_j_neucom_2025_129752
crossref_primary_10_1093_database_baae071
crossref_primary_10_1093_jamia_ocae202
crossref_primary_10_1186_s12859_024_05951_y
crossref_primary_10_1016_j_ascom_2024_100893
crossref_primary_10_1016_j_jbi_2023_104487
crossref_primary_10_1093_bioinformatics_btad310
crossref_primary_10_1093_bioinformatics_btae564
crossref_primary_10_3389_fncom_2024_1389475
crossref_primary_10_1016_j_inffus_2025_103033
crossref_primary_10_1093_database_baae068
crossref_primary_10_1093_database_baad054
crossref_primary_10_1093_database_baae069
crossref_primary_10_3390_app14209302
crossref_primary_10_1016_j_compbiomed_2023_106642
crossref_primary_10_1016_j_compchemeng_2023_108446
crossref_primary_10_1093_database_baae104
crossref_primary_10_1016_j_knosys_2024_112777
crossref_primary_10_1016_j_artmed_2024_102970
crossref_primary_10_1016_j_jbi_2023_104459
crossref_primary_10_1016_j_neucom_2024_129171
crossref_primary_10_1016_j_jbi_2024_104658
crossref_primary_10_1093_bioinformatics_btae418
crossref_primary_10_1162_coli_a_00520
crossref_primary_10_1016_j_jbi_2024_104733
crossref_primary_10_1186_s12859_023_05539_y
crossref_primary_10_1016_j_jbi_2024_104731
crossref_primary_10_1093_database_baae057
crossref_primary_10_1093_jamia_ocae061
crossref_primary_10_1186_s12859_024_06008_w
crossref_primary_10_26599_BDMA_2023_9020007
crossref_primary_10_1093_nar_gkae235
crossref_primary_10_1093_jamia_ocae147
crossref_primary_10_1016_j_ipm_2025_104128
crossref_primary_10_1093_database_baae095
crossref_primary_10_1093_bib_bbae132
crossref_primary_10_1093_bib_bbaf025
crossref_primary_10_1016_j_heliyon_2023_e20505
crossref_primary_10_1093_bioadv_vbae116
crossref_primary_10_1371_journal_pone_0292356
crossref_primary_10_1038_s41597_024_03835_7
crossref_primary_10_1016_j_websem_2022_100756
crossref_primary_10_1109_ACCESS_2024_3509714
crossref_primary_10_1038_s42256_025_01014_w
crossref_primary_10_1093_database_baae125
crossref_primary_10_1093_bioadv_vbae045
crossref_primary_10_3390_ijms232314934
crossref_primary_10_1016_j_jbi_2024_104719
crossref_primary_10_1093_jamia_ocae037
crossref_primary_10_1021_acs_jproteome_4c00535
Cites_doi 10.1186/s12859-019-3000-5
10.1093/database/bay073
10.1155/2015/918710
10.1093/bib/bbz171
10.1093/bioinformatics/btg1023
10.1093/bioinformatics/btw343
10.1016/j.jbi.2021.103931
10.1162/tacl_a_00049
10.1093/database/baw068
10.1093/jamia/ocz166
10.1186/1471-2105-11-85
10.1142/9789812799623_0031
10.1186/gb-2008-9-s2-s1
10.1371/journal.pone.0065390
10.1016/j.jbi.2013.07.011
10.1186/1471-2105-13-161
10.1186/1471-2105-8-50
10.1016/j.artmed.2004.07.016
10.1093/bioinformatics/btt156
10.1093/nar/gkz389
10.1371/journal.pcbi.1005017
10.1093/bioinformatics/btaa1087
10.1093/bioinformatics/btz682
10.1093/nar/29.1.239
10.1145/3458754
10.1162/neco.1997.9.8.1735
10.1093/bioinformatics/btq667
10.1016/j.jbi.2020.103384
10.1016/j.jbi.2013.12.006
10.1093/nar/gks563
10.1093/bioinformatics/btl616
10.1093/bioinformatics/btm235
10.1093/nar/gky355
10.1093/nar/gkaa333
10.1093/database/baw032
10.1186/1471-2105-6-S1-S11
10.1093/database/baw043
10.1016/j.jbi.2021.103779
10.1093/bioinformatics/btx541
10.1016/j.knosys.2018.11.020
ContentType Journal Article
Copyright The Author(s) 2022. Published by Oxford University Press. 2022
The Author(s) 2022. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2022. Published by Oxford University Press. 2022
– notice: The Author(s) 2022. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
DOI 10.1093/bib/bbac282
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Computer and Information Systems Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Biotechnology Research Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Engineering Research Database
Advanced Technologies Database with Aerospace
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
CrossRef

MEDLINE
Genetics Abstracts
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1477-4054
ExternalDocumentID PMC9487702
35849818
10_1093_bib_bbac282
10.1093/bib/bbac282
Genre Journal Article
Research Support, N.I.H., Intramural
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: U24 HG007822
– fundername: NIGMS NIH HHS
  grantid: R35 GM141873
– fundername: ;
– fundername: ;
  grantid: 2U24HG007822-08
GroupedDBID ---
-E4
.2P
.I3
0R~
1TH
23N
2WC
36B
4.4
48X
53G
5GY
5VS
6J9
70D
8VB
AAHBH
AAIJN
AAIMJ
AAJKP
AAJQQ
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AASNB
AAUQX
AAVAP
AAVLN
ABDBF
ABEUO
ABIXL
ABJNI
ABNKS
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFO
ACGFS
ACGOD
ACIWK
ACPRK
ACUFI
ACYTK
ADBBV
ADEYI
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADPDF
ADQBN
ADRDM
ADRIX
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEGXH
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AEMOZ
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AFXEN
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIAGR
AIJHB
AJEEA
AJEUX
AKHUL
AKVCP
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
APIBT
APWMN
ARIXL
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
E3Z
EAD
EAP
EAS
EBA
EBC
EBD
EBR
EBS
EBU
EE~
EJD
EMB
EMK
EMOBN
EST
ESX
F5P
F9B
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
K1G
KBUDW
KOP
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
QWB
RD5
ROX
RPM
RUSNO
RW1
RXO
SV3
TEORI
TH9
TJP
TLC
TOX
TR2
TUS
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
ZL0
~91
AAYXX
ABEJV
ABGNP
ABPQP
ABXZS
ACUHS
ACUXJ
AHGBF
AHQJS
ALXQX
AMNDL
ANAKG
CITATION
JXSIZ
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
ID FETCH-LOGICAL-c440t-9b4a1942393bf2f4d4a186b7d9ae0981e54166bc9c4dd3ddc5f17ee6171877993
IEDL.DBID TOX
ISSN 1467-5463
1477-4054
IngestDate Thu Aug 21 18:39:53 EDT 2025
Fri Jul 11 09:57:46 EDT 2025
Mon Jun 30 08:52:27 EDT 2025
Mon Jul 21 06:07:44 EDT 2025
Thu Apr 24 22:56:56 EDT 2025
Tue Jul 01 03:39:42 EDT 2025
Wed Aug 28 03:18:17 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords named entity recognition
biomedical dataset
relation extraction
biomedical natural language processing
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
https://creativecommons.org/licenses/by-nc/4.0
The Author(s) 2022. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c440t-9b4a1942393bf2f4d4a186b7d9ae0981e54166bc9c4dd3ddc5f17ee6171877993
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors wish it to be known that, in their opinion, Ling Luo, Po-Ting Lai and Chih-Hsuan Wei authors should be regarded as joint first authors.
ORCID 0000-0002-5141-0259
OpenAccessLink https://dx.doi.org/10.1093/bib/bbac282
PMID 35849818
PQID 2717367512
PQPubID 26846
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487702
proquest_miscellaneous_2691786990
proquest_journals_2717367512
pubmed_primary_35849818
crossref_primary_10_1093_bib_bbac282
crossref_citationtrail_10_1093_bib_bbac282
oup_primary_10_1093_bib_bbac282
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-09-20
PublicationDateYYYYMMDD 2022-09-20
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-20
  day: 20
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Briefings in bioinformatics
PublicationTitleAlternate Brief Bioinform
PublicationYear 2022
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Miranda (2022092013210653900_ref40) 2021
Allot (2022092013210653900_ref65) 2018; 46
Dong (2022092013210653900_ref35) 2021
Peng (2022092013210653900_ref42) 2018; 2018
Walker (2022092013210653900_ref33)
Wu (2022092013210653900_ref55) 2019
Zhang (2022092013210653900_ref32) 2017
Pafilis (2022092013210653900_ref23) 2013; 8
Kim (2022092013210653900_ref26) 2004
Pyysalo (2022092013210653900_ref59) 2015; 16
Wei (2022092013210653900_ref3) 2016; 2016
Herrero-Zazo (2022092013210653900_ref8) 2013; 46
Singhal (2022092013210653900_ref1) 2016; 12
Leaman (2022092013210653900_ref30) 2016; 32
Lai (2022092013210653900_ref64) 2020; 36
Hochreiter (2022092013210653900_ref62) 1997; 9
Islamaj Doğan (2022092013210653900_ref19) 2014; 47
Islamaj Doğan (2022092013210653900_ref61) 2020; 48
Morgan (2022092013210653900_ref15) 2008; 9
Lee (2022092013210653900_ref2) 2016; 2016
Akdemir (2022092013210653900_ref13) 2020
Ding (2022092013210653900_ref36) 2001
Henry (2022092013210653900_ref52) 2019; 27
Lafferty (2022092013210653900_ref63) 2001
Kim (2022092013210653900_ref5) 2003; 19
Caporaso (2022092013210653900_ref22) 2007; 23
Thomas (2022092013210653900_ref66) 2012; 40
Li (2022092013210653900_ref45) 2021; 123
Krallinger (2022092013210653900_ref7) 2008; 9
Alrowili (2022092013210653900_ref48) 2021
Wei (2022092013210653900_ref29) 2019
Pyysalo (2022092013210653900_ref6) 2007; 8
Fundel (2022092013210653900_ref39) 2007; 23
Bunescu (2022092013210653900_ref37) 2005; 33
Li (2022092013210653900_ref9) 2016; 2016
Xenarios (2022092013210653900_ref50) 2001; 29
Dörpinghaus (2022092013210653900_ref67) 2018
Wei (2022092013210653900_ref12) 2015; 2015
Kim (2022092013210653900_ref58) 2011
Krallinger (2022092013210653900_ref10) 2017
Luo (2022092013210653900_ref44) 2020; 103
Kim (2022092013210653900_ref57) 2009
Hirschman (2022092013210653900_ref16) 2005; 6
Pang (2022092013210653900_ref68)
Wei (2022092013210653900_ref28) 2019; 47
Bada (2022092013210653900_ref27) 2012; 13
Aronson (2022092013210653900_ref53) 2001
Su (2022092013210653900_ref54) 2021; 3
Gu (2022092013210653900_ref47) 2021; 3
Wang (2022092013210653900_ref11) 2019; 20
Nédellec (2022092013210653900_ref38) 2005
Yadav (2022092013210653900_ref43) 2019; 166
Wei (2022092013210653900_ref20) 2013; 29
Airola (2022092013210653900_ref41) 2008; 9
Baptista (2022092013210653900_ref4) 2021; 22
Krallinger (2022092013210653900_ref18) 2015; 7
Gerner (2022092013210653900_ref24) 2010; 11
Yao (2022092013210653900_ref34) 2019
Gurulingappa (2022092013210653900_ref51) 2012
Arighi (2022092013210653900_ref25) 2017
Lee (2022092013210653900_ref49) 2020; 36
Hendrickx (2022092013210653900_ref31) 2019
Raj Kanakarajan (2022092013210653900_ref46) 2021
Peng (2022092013210653900_ref56) 2017; 5
Wei (2022092013210653900_ref60) 2018; 34
Doughty (2022092013210653900_ref21) 2011; 27
Islamaj Doğan (2022092013210653900_ref17) 2021; 8
Islamaj Doğan (2022092013210653900_ref14) 2021; 118
References_xml – volume: 20
  start-page: 1
  issue: 1
  year: 2019
  ident: 2022092013210653900_ref11
  article-title: Multitask learning for biomedical named entity recognition with cross-sharing structure
  publication-title: BMC Bioinformat
  doi: 10.1186/s12859-019-3000-5
– volume: 2018
  start-page: bay073
  year: 2018
  ident: 2022092013210653900_ref42
  article-title: Extracting chemical–protein relations with ensembles of SVM and deep learning models
  publication-title: Database
  doi: 10.1093/database/bay073
– volume-title: Conditional random fields: probabilistic models for segmenting and labeling sequence data
  year: 2001
  ident: 2022092013210653900_ref63
– volume: 2015
  start-page: 918710
  year: 2015
  ident: 2022092013210653900_ref12
  article-title: GNormPlus: an integrative approach for tagging genes, gene families, and protein domains
  publication-title: Biomed Res Int
  doi: 10.1155/2015/918710
– volume: 22
  start-page: 360
  issue: 1
  year: 2021
  ident: 2022092013210653900_ref4
  article-title: Deep learning for drug response prediction in cancer
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbz171
– volume-title: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
  year: 2017
  ident: 2022092013210653900_ref32
– volume-title: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications
  year: 2004
  ident: 2022092013210653900_ref26
– volume: 19
  start-page: i180
  issue: suppl_1
  year: 2003
  ident: 2022092013210653900_ref5
  article-title: GENIA corpus—a semantically annotated corpus for bio-textmining
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btg1023
– volume-title: Proceedings of the sixth BioCreative Challenge Evaluation Workshop
  year: 2017
  ident: 2022092013210653900_ref10
– volume: 32
  start-page: 2839
  issue: 18
  year: 2016
  ident: 2022092013210653900_ref30
  article-title: TaggerOne: joint named entity recognition and normalization with semi-Markov Models
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw343
– start-page: 885
  volume-title: Journal of Biomedical Informatics
  year: 2012
  ident: 2022092013210653900_ref51
  article-title: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports
– volume-title: Proceedings of BioNLP shared task 2011 workshop
  year: 2011
  ident: 2022092013210653900_ref58
– volume: 123
  start-page: 103931
  year: 2021
  ident: 2022092013210653900_ref45
  article-title: Protein-protein interaction relation extraction based on multigranularity semantic fusion
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2021.103931
– volume: 5
  start-page: 101
  year: 2017
  ident: 2022092013210653900_ref56
  article-title: Cross-sentence n-ary relation extraction with graph lstms
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00049
– volume: 2016
  start-page: baw068
  year: 2016
  ident: 2022092013210653900_ref9
  article-title: BioCreative V CDR task corpus: a resource for chemical disease relation extraction
  publication-title: Database
  doi: 10.1093/database/baw068
– volume: 27
  start-page: 3
  issue: 1
  year: 2019
  ident: 2022092013210653900_ref52
  article-title: 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records
  publication-title: J Am Med Inform Assoc
  doi: 10.1093/jamia/ocz166
– volume: 11
  start-page: 1
  issue: 1
  year: 2010
  ident: 2022092013210653900_ref24
  article-title: LINNAEUS: a species name identification system for biomedical literature
  publication-title: BMC Bioinform
  doi: 10.1186/1471-2105-11-85
– start-page: 326
  volume-title: Biocomputing 2002
  year: 2001
  ident: 2022092013210653900_ref36
  doi: 10.1142/9789812799623_0031
– volume: 9
  start-page: 1
  issue: 2
  year: 2008
  ident: 2022092013210653900_ref7
  article-title: Overview of the protein-protein interaction annotation extraction task of BioCreative II
  publication-title: Genome Biol
  doi: 10.1186/gb-2008-9-s2-s1
– volume: 8
  start-page: e65390
  issue: 6
  year: 2013
  ident: 2022092013210653900_ref23
  article-title: The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0065390
– volume: 46
  start-page: 914
  issue: 5
  year: 2013
  ident: 2022092013210653900_ref8
  article-title: The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactions
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2013.07.011
– volume: 13
  start-page: 1
  issue: 1
  year: 2012
  ident: 2022092013210653900_ref27
  article-title: Concept annotation in the CRAFT corpus
  publication-title: BMC Bioinform
  doi: 10.1186/1471-2105-13-161
– volume: 8
  start-page: 1
  issue: 1
  year: 2007
  ident: 2022092013210653900_ref6
  article-title: BioInfer: a corpus for information extraction in the biomedical domain
  publication-title: BMC Bioinform
  doi: 10.1186/1471-2105-8-50
– volume: 33
  start-page: 139
  issue: 2
  year: 2005
  ident: 2022092013210653900_ref37
  article-title: Comparative experiments on learning information extractors for proteins and their interactions
  publication-title: Artif Intell Med
  doi: 10.1016/j.artmed.2004.07.016
– year: 2020
  ident: 2022092013210653900_ref13
  article-title: Analyzing the effect of multi-task learning for biomedical named entity recognition
– volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2022092013210653900_ref34
– volume-title: 4. Learning Language in Logic Workshop (LLL05)
  year: 2005
  ident: 2022092013210653900_ref38
– volume: 29
  start-page: 1433
  issue: 11
  year: 2013
  ident: 2022092013210653900_ref20
  article-title: tmVar: a text mining approach for extracting sequence variants in biomedical literature
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt156
– volume: 47
  start-page: W587
  issue: W1
  year: 2019
  ident: 2022092013210653900_ref28
  article-title: PubTator central: automated concept annotation for biomedical full text articles
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz389
– volume: 12
  start-page: e1005017
  issue: 11
  year: 2016
  ident: 2022092013210653900_ref1
  article-title: Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1005017
– volume-title: Proceedings of the BioCreative VII Challenge Evaluation Workshop
  year: 2021
  ident: 2022092013210653900_ref40
– volume: 36
  start-page: 5678
  issue: 24
  year: 2020
  ident: 2022092013210653900_ref64
  article-title: BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa1087
– volume: 8
  start-page: 1
  issue: 1
  year: 2021
  ident: 2022092013210653900_ref17
  article-title: NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
  publication-title: Sci Data
– volume: 36
  start-page: 1234
  issue: 4
  year: 2020
  ident: 2022092013210653900_ref49
  article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz682
– volume: 29
  start-page: 239
  issue: 1
  year: 2001
  ident: 2022092013210653900_ref50
  article-title: DIP: the database of interacting proteins: 2001 update
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/29.1.239
– volume: 3
  start-page: lqab062
  issue: 3
  year: 2021
  ident: 2022092013210653900_ref54
  article-title: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion. NAR Genomics
  publication-title: Bioinformatics
– volume: 9
  start-page: 1
  issue: 11
  year: 2008
  ident: 2022092013210653900_ref41
  article-title: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
  publication-title: BMC Bioinform
– volume-title: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
  year: 2021
  ident: 2022092013210653900_ref35
– volume: 3
  start-page: 1
  issue: 1
  year: 2021
  ident: 2022092013210653900_ref47
  article-title: Domain-specific language model pretraining for biomedical natural language processing
  publication-title: ACM Trans Comput Healthc
  doi: 10.1145/3458754
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 2022092013210653900_ref62
  article-title: Long short-term memory
  publication-title: Neural Comput
  doi: 10.1162/neco.1997.9.8.1735
– volume-title: In: Proceedings of the American Association for Cancer Research Annual Meeting
  ident: 2022092013210653900_ref68
– volume: 27
  start-page: 408
  issue: 3
  year: 2011
  ident: 2022092013210653900_ref21
  article-title: Toward an automatic method for extracting cancer-and other disease-related point mutations from the biomedical literature
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq667
– volume: 103
  start-page: 103384
  year: 2020
  ident: 2022092013210653900_ref44
  article-title: A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2020.103384
– volume: 47
  start-page: 1
  year: 2014
  ident: 2022092013210653900_ref19
  article-title: NCBI disease corpus: a resource for disease name recognition and concept normalization
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2013.12.006
– volume-title: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010
  year: 2019
  ident: 2022092013210653900_ref31
– volume: 40
  start-page: W585
  issue: W1
  year: 2012
  ident: 2022092013210653900_ref66
  article-title: GeneView: a comprehensive semantic search engine for PubMed
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks563
– volume-title: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
  year: 2019
  ident: 2022092013210653900_ref29
– volume-title: Proceedings of the 20th Workshop on Biomedical Language Processing
  year: 2021
  ident: 2022092013210653900_ref48
– volume: 23
  start-page: 365
  issue: 3
  year: 2007
  ident: 2022092013210653900_ref39
  article-title: RelEx—relation extraction using dependency parse trees
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl616
– volume: 23
  start-page: 1862
  issue: 14
  year: 2007
  ident: 2022092013210653900_ref22
  article-title: MutationFinder: a high-performance system for extracting point mutation mentions from text
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm235
– volume-title: Proceedings of the 20th Workshop on Biomedical Language Processing
  year: 2021
  ident: 2022092013210653900_ref46
– volume-title: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task
  year: 2009
  ident: 2022092013210653900_ref57
– volume: 46
  start-page: W530
  issue: W1
  year: 2018
  ident: 2022092013210653900_ref65
  article-title: LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky355
– volume-title: BioCreative VI Challenge Evaluation Workshop
  year: 2017
  ident: 2022092013210653900_ref25
– volume-title: Proceedings of the AMIA Symposium
  year: 2001
  ident: 2022092013210653900_ref53
– volume: 48
  start-page: W5
  issue: W1
  year: 2020
  ident: 2022092013210653900_ref61
  article-title: TeamTat: a collaborative text annotation tool
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa333
– volume: 2016
  start-page: baw032
  year: 2016
  ident: 2022092013210653900_ref3
  article-title: Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
  publication-title: Database
  doi: 10.1093/database/baw032
– volume-title: SEMANTICS Posters&Demos
  year: 2018
  ident: 2022092013210653900_ref67
– volume: 6
  start-page: S11
  issue: 1
  year: 2005
  ident: 2022092013210653900_ref16
  article-title: Overview of BioCreAtIvE task 1B: normalized gene lists
  publication-title: BMC Bioinformat
  doi: 10.1186/1471-2105-6-S1-S11
– volume: 9
  start-page: 1
  issue: 2
  year: 2008
  ident: 2022092013210653900_ref15
  article-title: Overview of BioCreative II gene normalization
  publication-title: Genome Biol
– volume: 16
  start-page: 1
  issue: 10
  year: 2015
  ident: 2022092013210653900_ref59
  article-title: Overview of the cancer genetics and pathway curation tasks of bionlp shared task 2013
  publication-title: BMC Bioinformat
– volume: 2016
  year: 2016
  ident: 2022092013210653900_ref2
  article-title: BRONCO: Biomedical entity relation oncology corpus for extracting gene-variant-disease-drug relations
  publication-title: Database
  doi: 10.1093/database/baw043
– volume-title: International Conference on Research in Computational Molecular Biology
  year: 2019
  ident: 2022092013210653900_ref55
– volume: 118
  start-page: 103779
  year: 2021
  ident: 2022092013210653900_ref14
  article-title: NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2021.103779
– volume: 34
  start-page: 80
  issue: 1
  year: 2018
  ident: 2022092013210653900_ref60
  article-title: tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx541
– start-page: 2006
  volume-title: Linguistic Data Consortium
  ident: 2022092013210653900_ref33
– volume: 7
  start-page: 1
  issue: 1
  year: 2015
  ident: 2022092013210653900_ref18
  article-title: The CHEMDNER corpus of chemicals and drugs and its annotation principles
  publication-title: J Chem
– volume: 166
  start-page: 18
  year: 2019
  ident: 2022092013210653900_ref43
  article-title: Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction
  publication-title: Knowledge-Base Syst
  doi: 10.1016/j.knosys.2018.11.020
SSID ssj0020781
Score 2.5900073
SecondaryResourceType review_article
Snippet Abstract Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world...
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
SubjectTerms Algorithms
Annotations
Automation
Availability
Benchmarks
Coders
Data Mining
Datasets
Protein interaction
Proteins
PubMed
Review
Title BioRED: a rich biomedical relation extraction dataset
URI https://www.ncbi.nlm.nih.gov/pubmed/35849818
https://www.proquest.com/docview/2717367512
https://www.proquest.com/docview/2691786990
https://pubmed.ncbi.nlm.nih.gov/PMC9487702
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dS8MwEA8yEHwRv51OjbAnIaxp06bxTXRjCCrIBnsr-SobSCdb9-B_72XNyjaGvrXkStu7JPc77u4XhNoipJIbJUluRE5YHCckpUFEcqu54jwXZtko_Pae9IfsdRSPfIHsfEcKX0QdNVEdpaSG4AC2WnC_jiJ_8DGq4yrHV1M1EXHi2N19G97WsxuOZ6OZbQ1TbpdGrvma3hE69CARP1VWPUZ7tjhB-9WxkT-nKIarz-7LI5YYdrExrlronbbxzNe2YdhzZ1XPAnZFoHNbnqFhrzt47hN__AHRjAUlEYpJKhxBX6TyMGcGbtNEcSOkDURKbQxgKlFaaGZMZIyOc8qtBUhCU84Bd5yjRjEt7CXCSighbCKZSSWjwkq37FjAtM654DZtooeVbjLtucHdERVfWZWjjjJQZOYV2UTtWvi7osTYLXYHSv5borUyQOZXzjwLXVkARDEUhu_rYZjzLpEhCztdgEwCQWaagCNtoovKXvV7IkBUoBz4Jb5hyVrA8WlvjhST8ZJXW0DwxoPw6t8Pv0YHoeuCcMmpoIUa5WxhbwCblOp2OTN_AfES4gQ
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioRED%3A+a+rich+biomedical+relation+extraction+dataset&rft.jtitle=Briefings+in+bioinformatics&rft.au=Luo%2C+Ling&rft.au=Lai%2C+Po-Ting&rft.au=Wei%2C+Chih-Hsuan&rft.au=Arighi%2C+Cecilia+N&rft.date=2022-09-20&rft.eissn=1477-4054&rft.volume=23&rft.issue=5&rft_id=info:doi/10.1093%2Fbib%2Fbbac282&rft_id=info%3Apmid%2F35849818&rft.externalDocID=35849818
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon