BioRED: a rich biomedical relation extraction dataset
Abstract Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein inte...
Saved in:
Published in | Briefings in bioinformatics Vol. 23; no. 5 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
20.09.2022
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Abstract
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine.
Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/. |
---|---|
AbstractList | Abstract
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine.
Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/. Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/. Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/. Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein–protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task ( F -score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations ( F -score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/ . Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/. |
Author | Wei, Chih-Hsuan Luo, Ling Lu, Zhiyong Arighi, Cecilia N Lai, Po-Ting |
Author_xml | – sequence: 1 givenname: Ling orcidid: 0000-0002-5141-0259 surname: Luo fullname: Luo, Ling email: ling.luo@nih.gov – sequence: 2 givenname: Po-Ting surname: Lai fullname: Lai, Po-Ting email: po-ting.lai@nih.gov – sequence: 3 givenname: Chih-Hsuan surname: Wei fullname: Wei, Chih-Hsuan email: chih-hsuan.wei@nih.gov – sequence: 4 givenname: Cecilia N surname: Arighi fullname: Arighi, Cecilia N email: arighi@udel.edu – sequence: 5 givenname: Zhiyong surname: Lu fullname: Lu, Zhiyong email: zhiyong.lu@nih.gov |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35849818$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kd1LwzAUxYMozk2ffJeCIILUJW2aND4IOucHDATR55Ckqcvompm0ov-92YdDB_qUG_K7J-fe0wXbta01AIcIniPI0r40si-lUEmebIE9hCmNMczw9rwmNM4wSTug6_0EwgTSHO2CTprlmOUo3wPZtbFPw5uLSETOqHEkjZ3qwihRRU5XojG2jvRH44RalIVohNfNPtgpReX1wersgZfb4fPgPh493j0Mrkaxwhg2MZNYIIaTlKWyTEpchGtOJC2Y0DD8rzOMCJGKKVwUaVGorERUa4IoyillLO2By6XurJXBltJ1cFLxmTNT4T65FYb_fqnNmL_ad85wEIBJEDhdCTj71mrf8KnxSleVqLVtPU8IQzQnjMGAHm-gE9u6OozHE4poSmiG5oJHPx2trXxvNABnS0A5673T5RpBkM_z4iEvvsor0GiDVqZZbD2MY6o_ek6WPbad_Sv-BZAspdE |
CitedBy_id | crossref_primary_10_1016_j_jbi_2024_104716 crossref_primary_10_1093_bioinformatics_btae474 crossref_primary_10_1016_j_jbi_2023_104431 crossref_primary_10_1093_bioinformatics_btad301 crossref_primary_10_3390_electronics14020328 crossref_primary_10_3390_math11020354 crossref_primary_10_1093_database_baae079 crossref_primary_10_1145_3592601 crossref_primary_10_1093_database_baae039 crossref_primary_10_1016_j_neucom_2025_129752 crossref_primary_10_1093_database_baae071 crossref_primary_10_1093_jamia_ocae202 crossref_primary_10_1186_s12859_024_05951_y crossref_primary_10_1016_j_ascom_2024_100893 crossref_primary_10_1016_j_jbi_2023_104487 crossref_primary_10_1093_bioinformatics_btad310 crossref_primary_10_1093_bioinformatics_btae564 crossref_primary_10_3389_fncom_2024_1389475 crossref_primary_10_1016_j_inffus_2025_103033 crossref_primary_10_1093_database_baae068 crossref_primary_10_1093_database_baad054 crossref_primary_10_1093_database_baae069 crossref_primary_10_3390_app14209302 crossref_primary_10_1016_j_compbiomed_2023_106642 crossref_primary_10_1016_j_compchemeng_2023_108446 crossref_primary_10_1093_database_baae104 crossref_primary_10_1016_j_knosys_2024_112777 crossref_primary_10_1016_j_artmed_2024_102970 crossref_primary_10_1016_j_jbi_2023_104459 crossref_primary_10_1016_j_neucom_2024_129171 crossref_primary_10_1016_j_jbi_2024_104658 crossref_primary_10_1093_bioinformatics_btae418 crossref_primary_10_1162_coli_a_00520 crossref_primary_10_1016_j_jbi_2024_104733 crossref_primary_10_1186_s12859_023_05539_y crossref_primary_10_1016_j_jbi_2024_104731 crossref_primary_10_1093_database_baae057 crossref_primary_10_1093_jamia_ocae061 crossref_primary_10_1186_s12859_024_06008_w crossref_primary_10_26599_BDMA_2023_9020007 crossref_primary_10_1093_nar_gkae235 crossref_primary_10_1093_jamia_ocae147 crossref_primary_10_1016_j_ipm_2025_104128 crossref_primary_10_1093_database_baae095 crossref_primary_10_1093_bib_bbae132 crossref_primary_10_1093_bib_bbaf025 crossref_primary_10_1016_j_heliyon_2023_e20505 crossref_primary_10_1093_bioadv_vbae116 crossref_primary_10_1371_journal_pone_0292356 crossref_primary_10_1038_s41597_024_03835_7 crossref_primary_10_1016_j_websem_2022_100756 crossref_primary_10_1109_ACCESS_2024_3509714 crossref_primary_10_1038_s42256_025_01014_w crossref_primary_10_1093_database_baae125 crossref_primary_10_1093_bioadv_vbae045 crossref_primary_10_3390_ijms232314934 crossref_primary_10_1016_j_jbi_2024_104719 crossref_primary_10_1093_jamia_ocae037 crossref_primary_10_1021_acs_jproteome_4c00535 |
Cites_doi | 10.1186/s12859-019-3000-5 10.1093/database/bay073 10.1155/2015/918710 10.1093/bib/bbz171 10.1093/bioinformatics/btg1023 10.1093/bioinformatics/btw343 10.1016/j.jbi.2021.103931 10.1162/tacl_a_00049 10.1093/database/baw068 10.1093/jamia/ocz166 10.1186/1471-2105-11-85 10.1142/9789812799623_0031 10.1186/gb-2008-9-s2-s1 10.1371/journal.pone.0065390 10.1016/j.jbi.2013.07.011 10.1186/1471-2105-13-161 10.1186/1471-2105-8-50 10.1016/j.artmed.2004.07.016 10.1093/bioinformatics/btt156 10.1093/nar/gkz389 10.1371/journal.pcbi.1005017 10.1093/bioinformatics/btaa1087 10.1093/bioinformatics/btz682 10.1093/nar/29.1.239 10.1145/3458754 10.1162/neco.1997.9.8.1735 10.1093/bioinformatics/btq667 10.1016/j.jbi.2020.103384 10.1016/j.jbi.2013.12.006 10.1093/nar/gks563 10.1093/bioinformatics/btl616 10.1093/bioinformatics/btm235 10.1093/nar/gky355 10.1093/nar/gkaa333 10.1093/database/baw032 10.1186/1471-2105-6-S1-S11 10.1093/database/baw043 10.1016/j.jbi.2021.103779 10.1093/bioinformatics/btx541 10.1016/j.knosys.2018.11.020 |
ContentType | Journal Article |
Copyright | The Author(s) 2022. Published by Oxford University Press. 2022 The Author(s) 2022. Published by Oxford University Press. |
Copyright_xml | – notice: The Author(s) 2022. Published by Oxford University Press. 2022 – notice: The Author(s) 2022. Published by Oxford University Press. |
DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8 5PM |
DOI | 10.1093/bib/bbac282 |
DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Computer and Information Systems Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Biotechnology Research Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Engineering Research Database Advanced Technologies Database with Aerospace Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic CrossRef MEDLINE Genetics Abstracts |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1477-4054 |
ExternalDocumentID | PMC9487702 35849818 10_1093_bib_bbac282 10.1093/bib/bbac282 |
Genre | Journal Article Research Support, N.I.H., Intramural Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: U24 HG007822 – fundername: NIGMS NIH HHS grantid: R35 GM141873 – fundername: ; – fundername: ; grantid: 2U24HG007822-08 |
GroupedDBID | --- -E4 .2P .I3 0R~ 1TH 23N 2WC 36B 4.4 48X 53G 5GY 5VS 6J9 70D 8VB AAHBH AAIJN AAIMJ AAJKP AAJQQ AAMDB AAMVS AAOGV AAPQZ AAPXW AARHZ AASNB AAUQX AAVAP AAVLN ABDBF ABEUO ABIXL ABJNI ABNKS ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFO ACGFS ACGOD ACIWK ACPRK ACUFI ACYTK ADBBV ADEYI ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADQBN ADRDM ADRIX ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEGXH AEJOX AEKKA AEKSI AELWJ AEMDU AEMOZ AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIAGR AIJHB AJEEA AJEUX AKHUL AKVCP AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC APIBT APWMN ARIXL AXUDD AYOIW AZVOD BAWUL BAYMD BCRHZ BEYMZ BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K E3Z EAD EAP EAS EBA EBC EBD EBR EBS EBU EE~ EJD EMB EMK EMOBN EST ESX F5P F9B FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GX1 H13 H5~ HAR HW0 HZ~ IOX J21 K1G KBUDW KOP KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y QWB RD5 ROX RPM RUSNO RW1 RXO SV3 TEORI TH9 TJP TLC TOX TR2 TUS W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ZL0 ~91 AAYXX ABEJV ABGNP ABPQP ABXZS ACUHS ACUXJ AHGBF AHQJS ALXQX AMNDL ANAKG CITATION JXSIZ CGR CUY CVF ECM EIF NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8 5PM |
ID | FETCH-LOGICAL-c440t-9b4a1942393bf2f4d4a186b7d9ae0981e54166bc9c4dd3ddc5f17ee6171877993 |
IEDL.DBID | TOX |
ISSN | 1467-5463 1477-4054 |
IngestDate | Thu Aug 21 18:39:53 EDT 2025 Fri Jul 11 09:57:46 EDT 2025 Mon Jun 30 08:52:27 EDT 2025 Mon Jul 21 06:07:44 EDT 2025 Thu Apr 24 22:56:56 EDT 2025 Tue Jul 01 03:39:42 EDT 2025 Wed Aug 28 03:18:17 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Keywords | named entity recognition biomedical dataset relation extraction biomedical natural language processing |
Language | English |
License | This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com https://creativecommons.org/licenses/by-nc/4.0 The Author(s) 2022. Published by Oxford University Press. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c440t-9b4a1942393bf2f4d4a186b7d9ae0981e54166bc9c4dd3ddc5f17ee6171877993 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 The authors wish it to be known that, in their opinion, Ling Luo, Po-Ting Lai and Chih-Hsuan Wei authors should be regarded as joint first authors. |
ORCID | 0000-0002-5141-0259 |
OpenAccessLink | https://dx.doi.org/10.1093/bib/bbac282 |
PMID | 35849818 |
PQID | 2717367512 |
PQPubID | 26846 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487702 proquest_miscellaneous_2691786990 proquest_journals_2717367512 pubmed_primary_35849818 crossref_primary_10_1093_bib_bbac282 crossref_citationtrail_10_1093_bib_bbac282 oup_primary_10_1093_bib_bbac282 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-09-20 |
PublicationDateYYYYMMDD | 2022-09-20 |
PublicationDate_xml | – month: 09 year: 2022 text: 2022-09-20 day: 20 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England – name: Oxford |
PublicationTitle | Briefings in bioinformatics |
PublicationTitleAlternate | Brief Bioinform |
PublicationYear | 2022 |
Publisher | Oxford University Press Oxford Publishing Limited (England) |
Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
References | Miranda (2022092013210653900_ref40) 2021 Allot (2022092013210653900_ref65) 2018; 46 Dong (2022092013210653900_ref35) 2021 Peng (2022092013210653900_ref42) 2018; 2018 Walker (2022092013210653900_ref33) Wu (2022092013210653900_ref55) 2019 Zhang (2022092013210653900_ref32) 2017 Pafilis (2022092013210653900_ref23) 2013; 8 Kim (2022092013210653900_ref26) 2004 Pyysalo (2022092013210653900_ref59) 2015; 16 Wei (2022092013210653900_ref3) 2016; 2016 Herrero-Zazo (2022092013210653900_ref8) 2013; 46 Singhal (2022092013210653900_ref1) 2016; 12 Leaman (2022092013210653900_ref30) 2016; 32 Lai (2022092013210653900_ref64) 2020; 36 Hochreiter (2022092013210653900_ref62) 1997; 9 Islamaj Doğan (2022092013210653900_ref19) 2014; 47 Islamaj Doğan (2022092013210653900_ref61) 2020; 48 Morgan (2022092013210653900_ref15) 2008; 9 Lee (2022092013210653900_ref2) 2016; 2016 Akdemir (2022092013210653900_ref13) 2020 Ding (2022092013210653900_ref36) 2001 Henry (2022092013210653900_ref52) 2019; 27 Lafferty (2022092013210653900_ref63) 2001 Kim (2022092013210653900_ref5) 2003; 19 Caporaso (2022092013210653900_ref22) 2007; 23 Thomas (2022092013210653900_ref66) 2012; 40 Li (2022092013210653900_ref45) 2021; 123 Krallinger (2022092013210653900_ref7) 2008; 9 Alrowili (2022092013210653900_ref48) 2021 Wei (2022092013210653900_ref29) 2019 Pyysalo (2022092013210653900_ref6) 2007; 8 Fundel (2022092013210653900_ref39) 2007; 23 Bunescu (2022092013210653900_ref37) 2005; 33 Li (2022092013210653900_ref9) 2016; 2016 Xenarios (2022092013210653900_ref50) 2001; 29 Dörpinghaus (2022092013210653900_ref67) 2018 Wei (2022092013210653900_ref12) 2015; 2015 Kim (2022092013210653900_ref58) 2011 Krallinger (2022092013210653900_ref10) 2017 Luo (2022092013210653900_ref44) 2020; 103 Kim (2022092013210653900_ref57) 2009 Hirschman (2022092013210653900_ref16) 2005; 6 Pang (2022092013210653900_ref68) Wei (2022092013210653900_ref28) 2019; 47 Bada (2022092013210653900_ref27) 2012; 13 Aronson (2022092013210653900_ref53) 2001 Su (2022092013210653900_ref54) 2021; 3 Gu (2022092013210653900_ref47) 2021; 3 Wang (2022092013210653900_ref11) 2019; 20 Nédellec (2022092013210653900_ref38) 2005 Yadav (2022092013210653900_ref43) 2019; 166 Wei (2022092013210653900_ref20) 2013; 29 Airola (2022092013210653900_ref41) 2008; 9 Baptista (2022092013210653900_ref4) 2021; 22 Krallinger (2022092013210653900_ref18) 2015; 7 Gerner (2022092013210653900_ref24) 2010; 11 Yao (2022092013210653900_ref34) 2019 Gurulingappa (2022092013210653900_ref51) 2012 Arighi (2022092013210653900_ref25) 2017 Lee (2022092013210653900_ref49) 2020; 36 Hendrickx (2022092013210653900_ref31) 2019 Raj Kanakarajan (2022092013210653900_ref46) 2021 Peng (2022092013210653900_ref56) 2017; 5 Wei (2022092013210653900_ref60) 2018; 34 Doughty (2022092013210653900_ref21) 2011; 27 Islamaj Doğan (2022092013210653900_ref17) 2021; 8 Islamaj Doğan (2022092013210653900_ref14) 2021; 118 |
References_xml | – volume: 20 start-page: 1 issue: 1 year: 2019 ident: 2022092013210653900_ref11 article-title: Multitask learning for biomedical named entity recognition with cross-sharing structure publication-title: BMC Bioinformat doi: 10.1186/s12859-019-3000-5 – volume: 2018 start-page: bay073 year: 2018 ident: 2022092013210653900_ref42 article-title: Extracting chemical–protein relations with ensembles of SVM and deep learning models publication-title: Database doi: 10.1093/database/bay073 – volume-title: Conditional random fields: probabilistic models for segmenting and labeling sequence data year: 2001 ident: 2022092013210653900_ref63 – volume: 2015 start-page: 918710 year: 2015 ident: 2022092013210653900_ref12 article-title: GNormPlus: an integrative approach for tagging genes, gene families, and protein domains publication-title: Biomed Res Int doi: 10.1155/2015/918710 – volume: 22 start-page: 360 issue: 1 year: 2021 ident: 2022092013210653900_ref4 article-title: Deep learning for drug response prediction in cancer publication-title: Brief Bioinform doi: 10.1093/bib/bbz171 – volume-title: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing year: 2017 ident: 2022092013210653900_ref32 – volume-title: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications year: 2004 ident: 2022092013210653900_ref26 – volume: 19 start-page: i180 issue: suppl_1 year: 2003 ident: 2022092013210653900_ref5 article-title: GENIA corpus—a semantically annotated corpus for bio-textmining publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg1023 – volume-title: Proceedings of the sixth BioCreative Challenge Evaluation Workshop year: 2017 ident: 2022092013210653900_ref10 – volume: 32 start-page: 2839 issue: 18 year: 2016 ident: 2022092013210653900_ref30 article-title: TaggerOne: joint named entity recognition and normalization with semi-Markov Models publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw343 – start-page: 885 volume-title: Journal of Biomedical Informatics year: 2012 ident: 2022092013210653900_ref51 article-title: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports – volume-title: Proceedings of BioNLP shared task 2011 workshop year: 2011 ident: 2022092013210653900_ref58 – volume: 123 start-page: 103931 year: 2021 ident: 2022092013210653900_ref45 article-title: Protein-protein interaction relation extraction based on multigranularity semantic fusion publication-title: J Biomed Inform doi: 10.1016/j.jbi.2021.103931 – volume: 5 start-page: 101 year: 2017 ident: 2022092013210653900_ref56 article-title: Cross-sentence n-ary relation extraction with graph lstms publication-title: Trans Assoc Comput Linguist doi: 10.1162/tacl_a_00049 – volume: 2016 start-page: baw068 year: 2016 ident: 2022092013210653900_ref9 article-title: BioCreative V CDR task corpus: a resource for chemical disease relation extraction publication-title: Database doi: 10.1093/database/baw068 – volume: 27 start-page: 3 issue: 1 year: 2019 ident: 2022092013210653900_ref52 article-title: 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records publication-title: J Am Med Inform Assoc doi: 10.1093/jamia/ocz166 – volume: 11 start-page: 1 issue: 1 year: 2010 ident: 2022092013210653900_ref24 article-title: LINNAEUS: a species name identification system for biomedical literature publication-title: BMC Bioinform doi: 10.1186/1471-2105-11-85 – start-page: 326 volume-title: Biocomputing 2002 year: 2001 ident: 2022092013210653900_ref36 doi: 10.1142/9789812799623_0031 – volume: 9 start-page: 1 issue: 2 year: 2008 ident: 2022092013210653900_ref7 article-title: Overview of the protein-protein interaction annotation extraction task of BioCreative II publication-title: Genome Biol doi: 10.1186/gb-2008-9-s2-s1 – volume: 8 start-page: e65390 issue: 6 year: 2013 ident: 2022092013210653900_ref23 article-title: The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text publication-title: PLoS One doi: 10.1371/journal.pone.0065390 – volume: 46 start-page: 914 issue: 5 year: 2013 ident: 2022092013210653900_ref8 article-title: The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactions publication-title: J Biomed Inform doi: 10.1016/j.jbi.2013.07.011 – volume: 13 start-page: 1 issue: 1 year: 2012 ident: 2022092013210653900_ref27 article-title: Concept annotation in the CRAFT corpus publication-title: BMC Bioinform doi: 10.1186/1471-2105-13-161 – volume: 8 start-page: 1 issue: 1 year: 2007 ident: 2022092013210653900_ref6 article-title: BioInfer: a corpus for information extraction in the biomedical domain publication-title: BMC Bioinform doi: 10.1186/1471-2105-8-50 – volume: 33 start-page: 139 issue: 2 year: 2005 ident: 2022092013210653900_ref37 article-title: Comparative experiments on learning information extractors for proteins and their interactions publication-title: Artif Intell Med doi: 10.1016/j.artmed.2004.07.016 – year: 2020 ident: 2022092013210653900_ref13 article-title: Analyzing the effect of multi-task learning for biomedical named entity recognition – volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2022092013210653900_ref34 – volume-title: 4. Learning Language in Logic Workshop (LLL05) year: 2005 ident: 2022092013210653900_ref38 – volume: 29 start-page: 1433 issue: 11 year: 2013 ident: 2022092013210653900_ref20 article-title: tmVar: a text mining approach for extracting sequence variants in biomedical literature publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt156 – volume: 47 start-page: W587 issue: W1 year: 2019 ident: 2022092013210653900_ref28 article-title: PubTator central: automated concept annotation for biomedical full text articles publication-title: Nucleic Acids Res doi: 10.1093/nar/gkz389 – volume: 12 start-page: e1005017 issue: 11 year: 2016 ident: 2022092013210653900_ref1 article-title: Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1005017 – volume-title: Proceedings of the BioCreative VII Challenge Evaluation Workshop year: 2021 ident: 2022092013210653900_ref40 – volume: 36 start-page: 5678 issue: 24 year: 2020 ident: 2022092013210653900_ref64 article-title: BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa1087 – volume: 8 start-page: 1 issue: 1 year: 2021 ident: 2022092013210653900_ref17 article-title: NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature publication-title: Sci Data – volume: 36 start-page: 1234 issue: 4 year: 2020 ident: 2022092013210653900_ref49 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – volume: 29 start-page: 239 issue: 1 year: 2001 ident: 2022092013210653900_ref50 article-title: DIP: the database of interacting proteins: 2001 update publication-title: Nucleic Acids Res doi: 10.1093/nar/29.1.239 – volume: 3 start-page: lqab062 issue: 3 year: 2021 ident: 2022092013210653900_ref54 article-title: RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion. NAR Genomics publication-title: Bioinformatics – volume: 9 start-page: 1 issue: 11 year: 2008 ident: 2022092013210653900_ref41 article-title: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning publication-title: BMC Bioinform – volume-title: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 year: 2021 ident: 2022092013210653900_ref35 – volume: 3 start-page: 1 issue: 1 year: 2021 ident: 2022092013210653900_ref47 article-title: Domain-specific language model pretraining for biomedical natural language processing publication-title: ACM Trans Comput Healthc doi: 10.1145/3458754 – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 2022092013210653900_ref62 article-title: Long short-term memory publication-title: Neural Comput doi: 10.1162/neco.1997.9.8.1735 – volume-title: In: Proceedings of the American Association for Cancer Research Annual Meeting ident: 2022092013210653900_ref68 – volume: 27 start-page: 408 issue: 3 year: 2011 ident: 2022092013210653900_ref21 article-title: Toward an automatic method for extracting cancer-and other disease-related point mutations from the biomedical literature publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq667 – volume: 103 start-page: 103384 year: 2020 ident: 2022092013210653900_ref44 article-title: A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature publication-title: J Biomed Inform doi: 10.1016/j.jbi.2020.103384 – volume: 47 start-page: 1 year: 2014 ident: 2022092013210653900_ref19 article-title: NCBI disease corpus: a resource for disease name recognition and concept normalization publication-title: J Biomed Inform doi: 10.1016/j.jbi.2013.12.006 – volume-title: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010 year: 2019 ident: 2022092013210653900_ref31 – volume: 40 start-page: W585 issue: W1 year: 2012 ident: 2022092013210653900_ref66 article-title: GeneView: a comprehensive semantic search engine for PubMed publication-title: Nucleic Acids Res doi: 10.1093/nar/gks563 – volume-title: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics year: 2019 ident: 2022092013210653900_ref29 – volume-title: Proceedings of the 20th Workshop on Biomedical Language Processing year: 2021 ident: 2022092013210653900_ref48 – volume: 23 start-page: 365 issue: 3 year: 2007 ident: 2022092013210653900_ref39 article-title: RelEx—relation extraction using dependency parse trees publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl616 – volume: 23 start-page: 1862 issue: 14 year: 2007 ident: 2022092013210653900_ref22 article-title: MutationFinder: a high-performance system for extracting point mutation mentions from text publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm235 – volume-title: Proceedings of the 20th Workshop on Biomedical Language Processing year: 2021 ident: 2022092013210653900_ref46 – volume-title: Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task year: 2009 ident: 2022092013210653900_ref57 – volume: 46 start-page: W530 issue: W1 year: 2018 ident: 2022092013210653900_ref65 article-title: LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC publication-title: Nucleic Acids Res doi: 10.1093/nar/gky355 – volume-title: BioCreative VI Challenge Evaluation Workshop year: 2017 ident: 2022092013210653900_ref25 – volume-title: Proceedings of the AMIA Symposium year: 2001 ident: 2022092013210653900_ref53 – volume: 48 start-page: W5 issue: W1 year: 2020 ident: 2022092013210653900_ref61 article-title: TeamTat: a collaborative text annotation tool publication-title: Nucleic Acids Res doi: 10.1093/nar/gkaa333 – volume: 2016 start-page: baw032 year: 2016 ident: 2022092013210653900_ref3 article-title: Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task publication-title: Database doi: 10.1093/database/baw032 – volume-title: SEMANTICS Posters&Demos year: 2018 ident: 2022092013210653900_ref67 – volume: 6 start-page: S11 issue: 1 year: 2005 ident: 2022092013210653900_ref16 article-title: Overview of BioCreAtIvE task 1B: normalized gene lists publication-title: BMC Bioinformat doi: 10.1186/1471-2105-6-S1-S11 – volume: 9 start-page: 1 issue: 2 year: 2008 ident: 2022092013210653900_ref15 article-title: Overview of BioCreative II gene normalization publication-title: Genome Biol – volume: 16 start-page: 1 issue: 10 year: 2015 ident: 2022092013210653900_ref59 article-title: Overview of the cancer genetics and pathway curation tasks of bionlp shared task 2013 publication-title: BMC Bioinformat – volume: 2016 year: 2016 ident: 2022092013210653900_ref2 article-title: BRONCO: Biomedical entity relation oncology corpus for extracting gene-variant-disease-drug relations publication-title: Database doi: 10.1093/database/baw043 – volume-title: International Conference on Research in Computational Molecular Biology year: 2019 ident: 2022092013210653900_ref55 – volume: 118 start-page: 103779 year: 2021 ident: 2022092013210653900_ref14 article-title: NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition publication-title: J Biomed Inform doi: 10.1016/j.jbi.2021.103779 – volume: 34 start-page: 80 issue: 1 year: 2018 ident: 2022092013210653900_ref60 article-title: tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx541 – start-page: 2006 volume-title: Linguistic Data Consortium ident: 2022092013210653900_ref33 – volume: 7 start-page: 1 issue: 1 year: 2015 ident: 2022092013210653900_ref18 article-title: The CHEMDNER corpus of chemicals and drugs and its annotation principles publication-title: J Chem – volume: 166 start-page: 18 year: 2019 ident: 2022092013210653900_ref43 article-title: Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction publication-title: Knowledge-Base Syst doi: 10.1016/j.knosys.2018.11.020 |
SSID | ssj0020781 |
Score | 2.5900073 |
SecondaryResourceType | review_article |
Snippet | Abstract
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world... Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world... |
SourceID | pubmedcentral proquest pubmed crossref oup |
SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
SubjectTerms | Algorithms Annotations Automation Availability Benchmarks Coders Data Mining Datasets Protein interaction Proteins PubMed Review |
Title | BioRED: a rich biomedical relation extraction dataset |
URI | https://www.ncbi.nlm.nih.gov/pubmed/35849818 https://www.proquest.com/docview/2717367512 https://www.proquest.com/docview/2691786990 https://pubmed.ncbi.nlm.nih.gov/PMC9487702 |
Volume | 23 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dS8MwEA8yEHwRv51OjbAnIaxp06bxTXRjCCrIBnsr-SobSCdb9-B_72XNyjaGvrXkStu7JPc77u4XhNoipJIbJUluRE5YHCckpUFEcqu54jwXZtko_Pae9IfsdRSPfIHsfEcKX0QdNVEdpaSG4AC2WnC_jiJ_8DGq4yrHV1M1EXHi2N19G97WsxuOZ6OZbQ1TbpdGrvma3hE69CARP1VWPUZ7tjhB-9WxkT-nKIarz-7LI5YYdrExrlronbbxzNe2YdhzZ1XPAnZFoHNbnqFhrzt47hN__AHRjAUlEYpJKhxBX6TyMGcGbtNEcSOkDURKbQxgKlFaaGZMZIyOc8qtBUhCU84Bd5yjRjEt7CXCSighbCKZSSWjwkq37FjAtM654DZtooeVbjLtucHdERVfWZWjjjJQZOYV2UTtWvi7osTYLXYHSv5borUyQOZXzjwLXVkARDEUhu_rYZjzLpEhCztdgEwCQWaagCNtoovKXvV7IkBUoBz4Jb5hyVrA8WlvjhST8ZJXW0DwxoPw6t8Pv0YHoeuCcMmpoIUa5WxhbwCblOp2OTN_AfES4gQ |
linkProvider | Oxford University Press |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioRED%3A+a+rich+biomedical+relation+extraction+dataset&rft.jtitle=Briefings+in+bioinformatics&rft.au=Luo%2C+Ling&rft.au=Lai%2C+Po-Ting&rft.au=Wei%2C+Chih-Hsuan&rft.au=Arighi%2C+Cecilia+N&rft.date=2022-09-20&rft.eissn=1477-4054&rft.volume=23&rft.issue=5&rft_id=info:doi/10.1093%2Fbib%2Fbbac282&rft_id=info%3Apmid%2F35849818&rft.externalDocID=35849818 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon |