Weakly supervised learning of biomedical information extraction from curated data

Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This p...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 17; no. S1; p. 1
Main Authors	Jain, Suvir, R., Kashyap, Kuo, Tsung-Ting, Bhargava, Shitij, Lin, Gordon, Hsu, Chun-Nan
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 11.01.2016 BioMed Central
Subjects	Abstracting and Indexing as Topic - methods Computational linguistics Data Curation Data mining Data Mining - methods Databases, Factual Disease - genetics Genetic Predisposition to Disease Genome-Wide Association Study Genomics Humans Language processing Machine learning Natural language interfaces Proceedings Risk Assessment New York
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-015-0844-1

Cover

Loading…

Abstract	Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining.
AbstractList	Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text.BACKGROUNDNumerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text.We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts.RESULTSWe test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts.The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining.CONCLUSIONSThe results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining. Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining.
ArticleNumber	S1
Audience	Academic
Author	Hsu, Chun-Nan Kuo, Tsung-Ting Bhargava, Shitij Jain, Suvir R., Kashyap Lin, Gordon
Author_xml	– sequence: 1 givenname: Suvir surname: Jain fullname: Jain, Suvir – sequence: 2 givenname: Kashyap surname: R. fullname: R., Kashyap – sequence: 3 givenname: Tsung-Ting surname: Kuo fullname: Kuo, Tsung-Ting – sequence: 4 givenname: Shitij surname: Bhargava fullname: Bhargava, Shitij – sequence: 5 givenname: Gordon surname: Lin fullname: Lin, Gordon – sequence: 6 givenname: Chun-Nan surname: Hsu fullname: Hsu, Chun-Nan
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/26817711$$D View this record in MEDLINE/PubMed
BookMark	eNqNkktv1DAUhS1URB_wA9igSGxgkeKb-JUNUlXxqFQJ8RJLy7GdweDYg51U9N_jzJRqBqGCvPDrO8f29TlGByEGi9BjwKcAgr3I0Aja1RhojQUhNdxDR0A41A1gerAzPkTHOX_DGLjA9AE6bJgAzgGO0PsvVn3311We1zZduWxN5a1KwYVVFYeqd3G0xmnlKxeGmEY1uRgq-3NKSm-GQ4pjpeekpiI1alIP0f1B-Wwf3fQn6PPrV5_O39aX795cnJ9d1poxPtW0w4L2hHGtbMvavqeip6rj1AjMKAbS9ha3hhpoSI9bjS0vU0Ohw4Y0grQn6OXWdz335Y7ahnInL9fJjSpdy6ic3N8J7qtcxStJBOFE0GLw7MYgxR-zzZMcXdbWexVsnLMEzoCwljEo6NMtulLeyqUSy_sXXJ4RJgSIrsN3UwREQ8SGOv0LVZqxo9PlgwdX1vds_0-wc8LzPUFhpvJjKzXnLC8-ftg3_ye74_tkt-K3pf6dpgLAFtAp5pzscIsAlkti5TaxsiRWLomVi4b_odFu2qSsvNL5O5S_AIZW61I
CitedBy_id	crossref_primary_10_1186_s12885_018_4894_4 crossref_primary_10_1002_smll_202203169 crossref_primary_10_1038_s41597_021_01078_4 crossref_primary_10_1038_s41598_019_38658_9 crossref_primary_10_1038_s41598_018_30455_0 crossref_primary_10_7717_peerj_14427 crossref_primary_10_1007_s13399_022_02804_7 crossref_primary_10_1007_s10462_023_10700_3 crossref_primary_10_1038_s41391_021_00403_7 crossref_primary_10_1038_s41598_021_04473_4 crossref_primary_10_1080_15476286_2017_1312243 crossref_primary_10_1038_s41598_017_16748_w crossref_primary_10_1038_s41467_024_54821_x crossref_primary_10_1038_s41598_018_38441_2 crossref_primary_10_1007_s42485_021_00077_8 crossref_primary_10_1016_j_patrec_2021_08_009 crossref_primary_10_1038_bcj_2016_81 crossref_primary_10_3389_frma_2021_683400 crossref_primary_10_1134_S0006297917110037 crossref_primary_10_1007_s11033_022_08145_y crossref_primary_10_1093_bib_bbw112 crossref_primary_10_1038_s41597_020_0427_5 crossref_primary_10_1136_bmjdrc_2022_003068 crossref_primary_10_1142_S0129183124500141 crossref_primary_10_7717_peerj_13061 crossref_primary_10_1016_j_artmed_2023_102505 crossref_primary_10_1038_srep34323 crossref_primary_10_1186_s13073_021_00840_y crossref_primary_10_1038_s41598_021_03334_4 crossref_primary_10_3390_informatics7040050 crossref_primary_10_1038_s41598_019_42694_w crossref_primary_10_1007_s00438_021_01831_9 crossref_primary_10_1002_widm_1288 crossref_primary_10_1038_s41391_020_00311_2 crossref_primary_10_1038_s41467_019_11026_x crossref_primary_10_1080_00949655_2024_2329976 crossref_primary_10_1177_18479790231222349 crossref_primary_10_1007_s00521_016_2680_2 crossref_primary_10_1039_C9RA05168F
Cites_doi	10.1186/1471-2105-10-326 10.1186/1471-2105-12-S8-S6 10.1093/bioinformatics/btm229 10.1186/1758-2946-2-3 10.1093/nar/gkh061 10.1108/00330330610681286 10.1186/1471-2105-12-S8-S1 10.1109/TNNLS.2013.2292894 10.1093/bioinformatics/btt333 10.1186/1471-2105-10-S15-S7 10.1613/jair.606 10.1186/gb-2008-9-s2-s3 10.1186/1471-2105-12-S8-S10 10.1093/database/bat080 10.1186/1471-2105-16-S5-S6 10.1016/j.jbi.2013.12.006 10.1186/1471-2105-12-S8-S4 10.1109/TMM.2011.2129498 10.1093/bioinformatics/btq099 10.1093/nar/gkt1229 10.1073/pnas.0903103106 10.1093/hmg/ddr302 10.1186/gb-2008-9-s2-s7 10.1186/1471-2105-13-172 10.1007/s10791-013-9219-2 10.1186/gb-2008-9-s2-s1 10.1109/TPAMI.2015.2456899 10.1093/bib/bbm045 10.1093/bioinformatics/btt474
ContentType	Journal Article
Copyright	COPYRIGHT 2016 BioMed Central Ltd. Jain et al. 2015
Copyright_xml	– notice: COPYRIGHT 2016 BioMed Central Ltd. – notice: Jain et al. 2015
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 7X8 5PM
DOI	10.1186/s12859-015-0844-1
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Science in Context MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1471-2105
ExternalDocumentID	PMC4847485 A468818990 A441824890 26817711 10_1186_s12859_015_0844_1
Genre	Journal Article Research Support, N.I.H., Extramural
GeographicLocations	New York
GeographicLocations_xml	– name: New York
GrantInformation_xml	– fundername: NHGRI NIH HHS grantid: U01HG006894 – fundername: NHGRI NIH HHS grantid: U01 HG006894
GroupedDBID	--- 0R~ 23N 2WC 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML AAYXX ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADRAZ ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CITATION CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EJD EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 H13 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB CGR CUY CVF ECM EIF NPM PJZUB PPXIY PQGLB PMFND 7X8 5PM
ID	FETCH-LOGICAL-c667t-59085b467cae363bb58b5a975d80650143be03d5d124b03c0e703dd5190d42843
IEDL.DBID	M48
ISSN	1471-2105
IngestDate	Thu Aug 21 18:17:44 EDT 2025 Fri Jul 11 05:54:50 EDT 2025 Tue Jun 17 22:05:03 EDT 2025 Tue Jun 17 22:07:29 EDT 2025 Tue Jun 10 21:05:11 EDT 2025 Tue Jun 10 21:10:33 EDT 2025 Fri Jun 27 05:47:23 EDT 2025 Fri Jun 27 06:08:23 EDT 2025 Mon Jul 21 05:48:00 EDT 2025 Tue Jul 01 03:38:21 EDT 2025 Thu Apr 24 22:51:19 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	S1
Language	English
License	Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c667t-59085b467cae363bb58b5a975d80650143be03d5d124b03c0e703dd5190d42843
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.1186/s12859-015-0844-1
PMID	26817711
PQID	1761463661
PQPubID	23479
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_4847485 proquest_miscellaneous_1761463661 gale_infotracmisc_A468818990 gale_infotracmisc_A441824890 gale_infotracacademiconefile_A468818990 gale_infotracacademiconefile_A441824890 gale_incontextgauss_ISR_A468818990 gale_incontextgauss_ISR_A441824890 pubmed_primary_26817711 crossref_primary_10_1186_s12859_015_0844_1 crossref_citationtrail_10_1186_s12859_015_0844_1
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2016-01-11 2016-Jan-11 20160111
PublicationDateYYYYMMDD	2016-01-11
PublicationDate_xml	– month: 01 year: 2016 text: 2016-01-11 day: 11
PublicationDecade	2010
PublicationPlace	England
PublicationPlace_xml	– name: England – name: London
PublicationTitle	BMC bioinformatics
PublicationTitleAlternate	BMC Bioinformatics
PublicationYear	2016
Publisher	BioMed Central Ltd BioMed Central
Publisher_xml	– name: BioMed Central Ltd – name: BioMed Central
References	CD Manning (844_CR1) 1999 F Zhu (844_CR17) 2013; 46 E Png (844_CR54) 2011; 20 M Krallinger (844_CR14) 2008; 9 844_CR21 B Mons (844_CR6) 2005 YZ Koh (844_CR18) 2014 CE Brodley (844_CR35) 1999; 11 RA Servedio (844_CR25) 2003; 4 JD Burger (844_CR7) 2014 X Chang (844_CR41) 2009 A Constantin (844_CR47) 2013 RII Doğan (844_CR38) 2014; 47 844_CR39 LA Hindorff (844_CR20) 2009; 106 J Whitehill (844_CR34) 2009 R Snow (844_CR9) 2008 M Simpson (844_CR16) 2012 O Bodenreider (844_CR49) 2004; 32 VS Sheng (844_CR23) 2008 B Frénay (844_CR24) 2014; 25 J Malone (844_CR44) 2010; 26 S Agarwal (844_CR32) 2011; 12 844_CR48 VC Raykar (844_CR33) 2010; 11 844_CR43 844_CR45 TC Wiegers (844_CR2) 2009; 10 CJ Kuo (844_CR52) 2009; 10 RB Altman (844_CR4) 2008; 9 A Morgan (844_CR28) 2008; 9 WA Baumgartner (844_CR10) 2007; 23 C Arighi (844_CR31) 2011; 12 N Natarajan (844_CR22) 2013 R Xu (844_CR13) 2015; 16 HY Lo (844_CR42) 2011; 13 S Bhargava (844_CR46) 2015 AP Davis (844_CR3) 2013; 2013 A Kalai (844_CR26) 2009 D Welter (844_CR19) 2014; 42 C Arighi (844_CR29) 2011; 12 K Hettne (844_CR5) 2010; 2 J Czarnecki (844_CR12) 2012; 13 844_CR50 844_CR51 M Chowdhury (844_CR53) 2010 BM Good (844_CR8) 2013; 29 C Bouveyron (844_CR27) 2009 R Leaman (844_CR37) 2014 S Kim (844_CR11) 2015 P Zweigenbaum (844_CR15) 2007; 8 CJ Kuo (844_CR30) 2011; 12 YX Ruan (844_CR40) 2014; 14 R Leaman (844_CR36) 2013; 29 24288140 - Database (Oxford). 2013;2013:bat080 24393765 - J Biomed Inform. 2014 Feb;47:1-10 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 21764829 - Hum Mol Genet. 2011 Oct 1;20(19):3893-8 23969135 - Bioinformatics. 2013 Nov 15;29(22):2909-17 25887671 - BMC Bioinformatics. 2015;16:57 19958517 - BMC Bioinformatics. 2009;10 Suppl 15:S7 17646325 - Bioinformatics. 2007 Jul 1;23(13):i41-8 22151968 - BMC Bioinformatics. 2011;12 Suppl 8:S4 22151701 - BMC Bioinformatics. 2011;12 Suppl 8:S10 19814812 - BMC Bioinformatics. 2009;10:326 25246425 - Database (Oxford). 2014;2014. pii: bau094. doi: 10.1093/database/bau094 27046490 - IEEE Trans Pattern Anal Mach Intell. 2016 Mar;38(3):447-61 22152021 - BMC Bioinformatics. 2011;12 Suppl 8:S6 26868016 - BMC Bioinformatics. 2016;17:84 22151647 - BMC Bioinformatics. 2011;12 Suppl 8:S1 24316577 - Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6 18834494 - Genome Biol. 2008;9 Suppl 2:S3 20331846 - J Cheminform. 2010 Mar 23;2(1):3 18834498 - Genome Biol. 2008;9 Suppl 2:S7 20200009 - Bioinformatics. 2010 Apr 15;26(8):1112-8 17977867 - Brief Bioinform. 2007 Sep;8(5):358-75 22823282 - BMC Bioinformatics. 2012;13:172 15941477 - BMC Bioinformatics. 2005;6:142 23159498 - J Biomed Inform. 2013 Apr;46(2):200-11 24808033 - IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69 25860223 - BMC Bioinformatics. 2015;16 Suppl 5:S6 19474294 - Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7 18834487 - Genome Biol. 2008;9 Suppl 2:S1 23782614 - Bioinformatics. 2013 Aug 15;29(16):1925-33
References_xml	– volume: 10 start-page: 326 issue: 1 year: 2009 ident: 844_CR2 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-10-326 – volume: 12 start-page: 6 issue: Suppl 8 year: 2011 ident: 844_CR30 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-12-S8-S6 – volume: 23 start-page: 41 issue: 13 year: 2007 ident: 844_CR10 publication-title: Bioinformatics (Oxford, England) doi: 10.1093/bioinformatics/btm229 – volume-title: Proceedings of BioNLP 2014 year: 2014 ident: 844_CR37 – volume: 2 start-page: 3 issue: 1 year: 2010 ident: 844_CR5 publication-title: J Cheminformatics doi: 10.1186/1758-2946-2-3 – volume: 11 start-page: 1297 year: 2010 ident: 844_CR33 publication-title: J Mach Learn Res – volume: 32 start-page: 267 issue: suppl 1 year: 2004 ident: 844_CR49 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh061 – volume-title: BMC Bioinformatics year: 2015 ident: 844_CR11 – ident: 844_CR51 doi: 10.1108/00330330610681286 – volume: 12 start-page: 1 issue: Suppl 8 year: 2011 ident: 844_CR31 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-12-S8-S1 – volume-title: Proceedings of the 2013 ACM symposium on document engineering, DocEng ’13 year: 2013 ident: 844_CR47 – volume-title: Mining text data year: 2012 ident: 844_CR16 – volume: 25 start-page: 845 issue: 5 year: 2014 ident: 844_CR24 publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2292894 – volume: 29 start-page: 1925 issue: 16 year: 2013 ident: 844_CR8 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt333 – volume: 10 start-page: 7 issue: Suppl 15 year: 2009 ident: 844_CR52 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-10-S15-S7 – volume-title: Advances in Neural Information Processing Systems 22 year: 2009 ident: 844_CR34 – volume: 11 start-page: 131 year: 1999 ident: 844_CR35 publication-title: J Artif Intell Res doi: 10.1613/jair.606 – volume: 9 start-page: 3 issue: Suppl 2 year: 2008 ident: 844_CR28 publication-title: Genome Biol doi: 10.1186/gb-2008-9-s2-s3 – volume-title: Preparing PDF scientific articles for biomedical text mining year: 2015 ident: 844_CR46 – volume: 12 start-page: 10 issue: Suppl 8 year: 2011 ident: 844_CR32 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-12-S8-S10 – ident: 844_CR45 – volume: 2013 start-page: 080 year: 2013 ident: 844_CR3 publication-title: Database doi: 10.1093/database/bat080 – volume: 16 start-page: 6 issue: Suppl 5 year: 2015 ident: 844_CR13 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-16-S5-S6 – volume: 47 start-page: 1 year: 2014 ident: 844_CR38 publication-title: J Biomed Inform doi: 10.1016/j.jbi.2013.12.006 – volume-title: Bio-Inspired Systems: Computational and Ambient Intelligence 10th International Work-Conference on Artificial Neural Networks, IWANN 2009, Salamanca, Spain, June 10-12, 2009. Proceedings, Part I year: 2009 ident: 844_CR27 – volume: 12 start-page: 4 issue: Suppl 8 year: 2011 ident: 844_CR29 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-12-S8-S4 – volume: 13 start-page: 518 issue: 3 year: 2011 ident: 844_CR42 publication-title: Multimedia IEEE Trans doi: 10.1109/TMM.2011.2129498 – volume: 26 start-page: 1112 issue: 8 year: 2010 ident: 844_CR44 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq099 – volume-title: Proceedings of the intelligent computing 5th international conference on emerging intelligent computing technology and applications, ICIC’09 year: 2009 ident: 844_CR41 – ident: 844_CR39 – volume-title: Advances in neural information processing systems 26 year: 2013 ident: 844_CR22 – volume-title: Database: J Biol Databases Curation year: 2014 ident: 844_CR7 – volume: 42 start-page: 1001 issue: Database issue year: 2014 ident: 844_CR19 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkt1229 – volume: 4 start-page: 633 year: 2003 ident: 844_CR25 publication-title: J Mach Learn Res – ident: 844_CR48 – volume: 106 start-page: 9362 issue: 23 year: 2009 ident: 844_CR20 publication-title: Proc Natl Acad Sci doi: 10.1073/pnas.0903103106 – volume: 20 start-page: 3893 issue: 19 year: 2011 ident: 844_CR54 publication-title: Hum Mol Genet doi: 10.1093/hmg/ddr302 – volume-title: BMC Bioinformatics year: 2005 ident: 844_CR6 – volume: 46 start-page: 200 issue: 2 year: 2013 ident: 844_CR17 publication-title: J Biomed Eng – volume: 9 start-page: 7 issue: Suppl 2 year: 2008 ident: 844_CR4 publication-title: Genome Biol doi: 10.1186/gb-2008-9-s2-s7 – volume-title: Proceedings of the 2008 conference on empirical methods in natural language processing year: 2008 ident: 844_CR9 – volume-title: Proceedings of the 14th ACM SIGKDD International conference on knowledge discovery and data mining, KDD ’08 year: 2008 ident: 844_CR23 – volume: 13 start-page: 172 issue: 1 year: 2012 ident: 844_CR12 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-13-172 – volume: 14 start-page: 1 issue: 1 year: 2014 ident: 844_CR40 publication-title: Inf Retr doi: 10.1007/s10791-013-9219-2 – volume-title: Proceedings of the 2010 workshop on biomedical natural language processing year: 2010 ident: 844_CR53 – volume: 9 start-page: 1 issue: Suppl 2 year: 2008 ident: 844_CR14 publication-title: Genome Biol doi: 10.1186/gb-2008-9-s2-s1 – ident: 844_CR43 – volume-title: Foundations of Statistical Natural Language Processing year: 1999 ident: 844_CR1 – ident: 844_CR21 doi: 10.1109/TPAMI.2015.2456899 – volume: 8 start-page: 358 issue: 5 year: 2007 ident: 844_CR15 publication-title: Brief Bioinform doi: 10.1093/bib/bbm045 – volume-title: Comput Math Biol year: 2014 ident: 844_CR18 – volume: 29 start-page: 2909 issue: 22 year: 2013 ident: 844_CR36 publication-title: Bioinformatics (Oxford, England) doi: 10.1093/bioinformatics/btt474 – ident: 844_CR50 – volume-title: Advances in neural information processing systems 22 year: 2009 ident: 844_CR26 – reference: 26868016 - BMC Bioinformatics. 2016;17:84 – reference: 22823282 - BMC Bioinformatics. 2012;13:172 – reference: 17646325 - Bioinformatics. 2007 Jul 1;23(13):i41-8 – reference: 15941477 - BMC Bioinformatics. 2005;6:142 – reference: 21764829 - Hum Mol Genet. 2011 Oct 1;20(19):3893-8 – reference: 17977867 - Brief Bioinform. 2007 Sep;8(5):358-75 – reference: 20331846 - J Cheminform. 2010 Mar 23;2(1):3 – reference: 25246425 - Database (Oxford). 2014;2014. pii: bau094. doi: 10.1093/database/bau094 – reference: 25860223 - BMC Bioinformatics. 2015;16 Suppl 5:S6 – reference: 19814812 - BMC Bioinformatics. 2009;10:326 – reference: 23782614 - Bioinformatics. 2013 Aug 15;29(16):1925-33 – reference: 18834487 - Genome Biol. 2008;9 Suppl 2:S1 – reference: 23969135 - Bioinformatics. 2013 Nov 15;29(22):2909-17 – reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 – reference: 20200009 - Bioinformatics. 2010 Apr 15;26(8):1112-8 – reference: 19958517 - BMC Bioinformatics. 2009;10 Suppl 15:S7 – reference: 19474294 - Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7 – reference: 23159498 - J Biomed Inform. 2013 Apr;46(2):200-11 – reference: 22151647 - BMC Bioinformatics. 2011;12 Suppl 8:S1 – reference: 22152021 - BMC Bioinformatics. 2011;12 Suppl 8:S6 – reference: 24288140 - Database (Oxford). 2013;2013:bat080 – reference: 24316577 - Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6 – reference: 22151968 - BMC Bioinformatics. 2011;12 Suppl 8:S4 – reference: 24808033 - IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69 – reference: 18834498 - Genome Biol. 2008;9 Suppl 2:S7 – reference: 25887671 - BMC Bioinformatics. 2015;16:57 – reference: 24393765 - J Biomed Inform. 2014 Feb;47:1-10 – reference: 22151701 - BMC Bioinformatics. 2011;12 Suppl 8:S10 – reference: 27046490 - IEEE Trans Pattern Anal Mach Intell. 2016 Mar;38(3):447-61 – reference: 18834494 - Genome Biol. 2008;9 Suppl 2:S3
SSID	ssj0017805
Score	2.2657168
Snippet	Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information...
SourceID	pubmedcentral proquest gale pubmed crossref
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	1
SubjectTerms	Abstracting and Indexing as Topic - methods Computational linguistics Data Curation Data mining Data Mining - methods Databases, Factual Disease - genetics Genetic Predisposition to Disease Genome-Wide Association Study Genomics Humans Language processing Machine learning Natural language interfaces Proceedings Risk Assessment
Title	Weakly supervised learning of biomedical information extraction from curated data
URI	https://www.ncbi.nlm.nih.gov/pubmed/26817711 https://www.proquest.com/docview/1761463661 https://pubmed.ncbi.nlm.nih.gov/PMC4847485
Volume	17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwELfGJiReEONrha0yCAkJKRA3_soDQtu0MpA2waCib1bsOANRpaNpJPrfc5e47YKqIV7y4oul3Nl3v4t9vyPkRZF6JPKykVMxluRYG9lUZhHPpE_8IHaCY6Hw2bk8HfGPYzHeIsv2VkGB1cbUDvtJjWaT179_Ld7Bhn_bbHgt31QMWdggKRbIUsojSIZ2IDAp7ORwxteHCkjf3xQbKRZBpiPCIefGKTph6m9nfS1adW9SXgtNw3vkbsCU9LBdBLtky5f3ye22y-TiAfn8zWc_Jwta1VfoGCqf09Ar4pJOC9oW4KOtaGBRRVtRcNqztuiBYgkKdTWSSuQUr5Q-JKPhydfj0yh0UoiclGoeYWNzYcEnuswnEgwitBVZqkSO56pI8Wd9nOQih2hv48TFHhxBngO6i3PIT3jyiGyX09LvEVpIqwtZpNZLD1BLQhY6sEwoSNycclz2SLxUnHGBZhy7XUxMk25oaVpdG9C1QV0b1iOvVq9ctRwbNwk_R2sY5K4o8XLMZVZXlfnw5cIcArTTA67T-EYhqQGkpCj0MggVU9RoFgoS4DuRE6sz3b8k13PudyRhr7rORBuH128_Wy4xg0N4_63007oyTAGMkgmAqR553C65laoGUjOlGIyozmJcCSCDeHek_PG9YRLngE24Fk_-xwBPyR0Ajc1vKMb2yfZ8VvsDAGZz2ye31FjBUw_f98nO0cn5p4t-85Oj32zEP8CANIU
linkProvider	Scholars Portal
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Weakly+supervised+learning+of+biomedical+information+extraction+from+curated+data&rft.jtitle=BMC+bioinformatics&rft.au=Jain%2C+Suvir&rft.au=R.%2C+Kashyap&rft.au=Kuo%2C+Tsung-Ting&rft.au=Bhargava%2C+Shitij&rft.date=2016-01-11&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=17&rft.issue=S1&rft_id=info:doi/10.1186%2Fs12859-015-0844-1&rft.externalDBID=n%2Fa&rft.externalDocID=10_1186_s12859_015_0844_1
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon