Rutabaga by any other name: extracting biological names

As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the gr...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 35; no. 4; pp. 247 - 259
Main Authors	Hirschman, Lynette, Morgan, Alexander A., Yeh, Alexander S.
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.08.2002
Subjects	Abstracting and Indexing as Topic Biology - methods Database Management Systems Databases, Factual Dictionaries as Topic Information Storage and Retrieval - methods Internet Names Natural Language Processing Software Subject Headings Terminology as Topic User-Computer Interface Vocabulary, Controlled
Online Access	Get full text
ISSN	1532-0464 1532-0480
DOI	10.1016/S1532-0464(03)00014-5

Cover

Loading…

Abstract	As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
AbstractList	As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance. As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
Author	Yeh, Alexander S. Hirschman, Lynette Morgan, Alexander A.
Author_xml	– sequence: 1 givenname: Lynette surname: Hirschman fullname: Hirschman, Lynette email: lynette@mitre.org – sequence: 2 givenname: Alexander A. surname: Morgan fullname: Morgan, Alexander A. – sequence: 3 givenname: Alexander S. surname: Yeh fullname: Yeh, Alexander S.
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/12755519$$D View this record in MEDLINE/PubMed
BookMark	eNqFkUlPwzAQhS0EorTwE0A5ITgEvMRZ4IBQxSZVQoLeLWcyKUZpXGwX0X9PutADl55mNPO9Gem9PtlvbYuEnDJ6xShLr9-ZFDymSZpcUHFJKWVJLPfI0Wac0_1tnyY90vf-s2OYlOkh6TGeSSlZcUSyt3nQpZ7oqFxEul1ENnygi1o9xZsIf4LTEEw7iUpjGzsxoJvVzh-Tg1o3Hk82dUDGjw_j4XM8en16Gd6PYhAFD7HIa5lJXjOdCiGACwaQYJnVNVQ5pEJLkQNKBAkVrQpeF7woMlZUZcJRJmJAztdnZ85-zdEHNTUesGl0i3buVcbzgok82wmyPKNpB3fg2Qacl1Os1MyZqXYL9WdJB9yuAXDWe4e1AhN0MLbtzDCNYlQtA1CrANTSXUWFWgWgZKeW_9TbBzt0d2sddmZ-G3TKg8EWsDIOIajKmh0XfgFcZ5tE
CitedBy_id	crossref_primary_10_1186_1471_2105_6_S1_S5 crossref_primary_10_1186_1471_2105_6_S1_S2 crossref_primary_10_1016_j_tibtech_2006_10_002 crossref_primary_10_1021_acs_chemrev_6b00851 crossref_primary_10_1186_2041_1480_2_1 crossref_primary_10_1093_bib_bbn043 crossref_primary_10_1016_j_jbi_2011_10_004 crossref_primary_10_1142_S0219720004000399 crossref_primary_10_1186_1756_0381_5_13 crossref_primary_10_1016_j_cell_2008_06_029 crossref_primary_10_1016_j_jbi_2004_08_010 crossref_primary_10_1371_journal_pcbi_1000411 crossref_primary_10_1016_j_jbi_2003_10_001 crossref_primary_10_1186_s12859_015_0487_2 crossref_primary_10_1108_00220411211200301 crossref_primary_10_1186_1471_2105_7_372 crossref_primary_10_1186_1471_2105_12_S8_S5 crossref_primary_10_1093_bioinformatics_btx815 crossref_primary_10_1007_s10791_008_9072_x crossref_primary_10_1109_TITB_2005_856857 crossref_primary_10_1002_cfg_459 crossref_primary_10_1016_j_compbiolchem_2004_09_010 crossref_primary_10_1093_database_baac039 crossref_primary_10_1186_1471_2105_6_103 crossref_primary_10_1186_1472_6947_12_36 crossref_primary_10_1093_bib_bbr018 crossref_primary_10_1109_ACCESS_2019_2932842 crossref_primary_10_1016_j_patter_2021_100328 crossref_primary_10_1074_jbc_R110_176370 crossref_primary_10_1093_database_bas042 crossref_primary_10_3390_fi11090185 crossref_primary_10_1007_s10257_014_0259_y crossref_primary_10_1186_1471_2105_6_S1_S15 crossref_primary_10_1186_gb_2008_9_s2_s13 crossref_primary_10_1186_1471_2105_7_220 crossref_primary_10_1016_j_jbi_2004_08_004 crossref_primary_10_1093_bioinformatics_bti296 crossref_primary_10_1186_1471_2105_8_S9_S5 crossref_primary_10_1186_1471_2105_6_88 crossref_primary_10_1186_1472_6947_5_35 crossref_primary_10_3389_fdgth_2022_1065581 crossref_primary_10_1093_bioinformatics_bti733
Cites_doi	10.1038/ng0501-9 10.1147/sj.402.0532 10.1093/bioinformatics/17.suppl_1.S74 10.3115/1117729.1117733 10.1017/S1351324901002807 10.1093/bioinformatics/18.suppl_1.S249 10.1016/S0378-1119(00)00431-5 10.1006/csla.1998.0102 10.3115/974147.974187 10.1101/gr.199701 10.1093/bioinformatics/19.1.135 10.1093/bioinformatics/17.suppl_1.S97 10.3115/990820.990850 10.1093/bioinformatics/18.8.1124 10.1093/bib/3.2.154
ContentType	Journal Article
Copyright	2003 Elsevier Science (USA)
Copyright_xml	– notice: 2003 Elsevier Science (USA)
DBID	6I. AAFTH AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 8FD FR3 P64 7X8
DOI	10.1016/S1532-0464(03)00014-5
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Technology Research Database Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Engineering Research Database Biotechnology Research Abstracts Technology Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitleList	Engineering Research Database MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Engineering Public Health
EISSN	1532-0480
EndPage	259
ExternalDocumentID	12755519 10_1016_S1532_0464_03_00014_5 S1532046403000145
Genre	Journal Article
GroupedDBID	--- --K --M -~X .DC .GJ .~1 0R~ 1B1 1RT 1~. 1~5 29J 4.4 457 4G. 53G 5GY 5VS 6I. 7-5 71M 8P~ AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAWTL AAXUO AAYFN ABBOA ABBQC ABFRF ABJNI ABLVK ABMAC ABMZM ABVKL ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADMUD AEBSH AEFWE AEKER AENEX AEXQZ AFKWA AFTJW AFXIZ AGHFR AGUBO AGYEJ AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AJRQY ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ANZVX AOUOD ASPBG AVWKF AXJTR AZFZN BAWUL BKOJK BLXMC BNPGV CAG COF CS3 DIK DM4 DU5 EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HVGLF HZ~ IHE IXB J1W KOM LCYCR LG5 M41 MO0 N9A NCXOZ O-L O9- OAUVE OK1 OZT P-8 P-9 PC. Q38 R2- RIG ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SSH SSV SSZ T5K UAP UHS UNMZH XPP ZGI ZMT ZU3 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACIEU ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION 0SF CGR CUY CVF ECM EIF NPM 7QO 8FD EFKBS FR3 P64 7X8
ID	FETCH-LOGICAL-c392t-38f5752f1a6333c231cc4eb7ffcd8c63a538ce5ec5cd0d92f9299719db42e543
IEDL.DBID	.~1
ISSN	1532-0464
IngestDate	Fri Sep 05 14:05:33 EDT 2025 Fri Sep 05 08:26:09 EDT 2025 Wed Feb 19 02:41:05 EST 2025 Tue Jul 01 04:11:49 EDT 2025 Thu Apr 24 23:01:04 EDT 2025 Fri Feb 23 02:33:46 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	http://www.elsevier.com/open-access/userlicense/1.0 https://www.elsevier.com/tdm/userlicense/1.0 https://www.elsevier.com/open-access/userlicense/1.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c392t-38f5752f1a6333c231cc4eb7ffcd8c63a538ce5ec5cd0d92f9299719db42e543
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
OpenAccessLink	https://www.sciencedirect.com/science/article/pii/S1532046403000145
PMID	12755519
PQID	18706289
PQPubID	23462
PageCount	13
ParticipantIDs	proquest_miscellaneous_72891387 proquest_miscellaneous_18706289 pubmed_primary_12755519 crossref_citationtrail_10_1016_S1532_0464_03_00014_5 crossref_primary_10_1016_S1532_0464_03_00014_5 elsevier_sciencedirect_doi_10_1016_S1532_0464_03_00014_5
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2002-08-01
PublicationDateYYYYMMDD	2002-08-01
PublicationDate_xml	– month: 08 year: 2002 text: 2002-08-01 day: 01
PublicationDecade	2000
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	Journal of biomedical informatics
PublicationTitleAlternate	J Biomed Inform
PublicationYear	2002
Publisher	Elsevier Inc
Publisher_xml	– name: Elsevier Inc
References	Mikheev A, Grover C, Moens M. Description of the LTG System used for MUC-7, 1998. Available at Nobata C, Collier N, Tsujii J. Comparison between tagged corpora for the named entity task. In: Proceedings of ACL 2000 Workshop on Comparing Corpora; 2000. p. 20–7 Raychaudhuri, Chang, Sutphin, Altman (BIB4) 2002; 12 C. Aone, L. Halverson, T. Hampton, M. Ramos-Santacruz, SRA: description of the IE2_System used for MUC-7, 1998. Available on-line at Proux D, Rechenmann F, Julliard L, Pillet V, Jacq B, Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Proceedings of the 9th Workshop on Genome Informatics; 1998. p. 72–80 Gaizauskas, Demetriou, Artymiuk, Willett (BIB23) 2003; 19 Yu S, Bai S, Wu P. Description of the Kent Ridge Digital Labs System used for MUC-7, 1998. Available on-line at Krauthammer, Rzhetsky, Morosov, Friedman (BIB21) 2000; 259 Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology; 1999. p. 77–86 Masys (BIB6) 2001; 28 Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein–protein interactions. Int Conf Intell Syst Mol Biol 1999:60–7 Stevenson M, Gaizauskas R. Using corpus-derived name lists for named entity recognition. In: Proceedings of the Applied Natural Language Processing and the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-2000); 2000. p. 290–5 Hahn, Romacker, Schulz (BIB3) 2002; 7 Chinchor N, Marsh E. Message Understanding Conference Proceedings: MUC-7, 1998. Available at MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7), Defense Advanced Research Projects Agency, 1998. Available at Ohta T, Tateishi Y, Collier N, Nobata C, Tsujii J. Building an annotated corpus from biology research papers. In: Proceedings of COLING 2000 Workshop on Semantic Annotation and Intelligent Content; 2000. p. 28–34 Hatzivassiloglou, Duboue, Rzhetsky (BIB29) 2001; 17 Blaschke, Hirschman, Valencia (BIB1) 2002; 3 Goble, Stevens, Ng, Bechhofer, Paton, Baker, Peim, Brass (BIB2) 2001; 40 Hirschman (BIB10) 1998; 12 Fukuda, Tsunoda, Tamura, Takagi (BIB19) 1998; 3 Fukumoto J, Masui F, Shimcheta M, Saski M. Description of the Oki System as used for MUC-7, 1998. Available at Sundheim (BIB11) 1995 Tanabe, Wilbur (BIB26) 2002; 18 Chang, Raychaudhuri, Altman (BIB5) 2001; 6 Bikel, Schwartz, Weischedel (BIB18) 1999; 34 Friedman, Kra, Yu, Krauthammer, Rzhetsky (BIB24) 2001; 1 Krauthammer, Kra, Iossifov, Gomez, Hripcsak, Hatzivassiloglou, Friedman, Rzhetsky (BIB8) 2002; 18 Hirschman, Gaizauskas (BIB9) 2001; 7 Collier N, Nobata C, Tsujii J. Extracting the names of genes and gene products with a Hidden Markov model. In: Proceedings of COLING ’2000; 2000. p. 201–7 Krauthammer (10.1016/S1532-0464(03)00014-5_BIB8) 2002; 18 Goble (10.1016/S1532-0464(03)00014-5_BIB2) 2001; 40 Fukuda (10.1016/S1532-0464(03)00014-5_BIB19) 1998; 3 Hatzivassiloglou (10.1016/S1532-0464(03)00014-5_BIB29) 2001; 17 Sundheim (10.1016/S1532-0464(03)00014-5_BIB11) 1995 Tanabe (10.1016/S1532-0464(03)00014-5_BIB26) 2002; 18 10.1016/S1532-0464(03)00014-5_BIB14 10.1016/S1532-0464(03)00014-5_BIB13 Hirschman (10.1016/S1532-0464(03)00014-5_BIB9) 2001; 7 10.1016/S1532-0464(03)00014-5_BIB12 Blaschke (10.1016/S1532-0464(03)00014-5_BIB1) 2002; 3 Gaizauskas (10.1016/S1532-0464(03)00014-5_BIB23) 2003; 19 10.1016/S1532-0464(03)00014-5_BIB30 10.1016/S1532-0464(03)00014-5_BIB17 10.1016/S1532-0464(03)00014-5_BIB16 10.1016/S1532-0464(03)00014-5_BIB15 Masys (10.1016/S1532-0464(03)00014-5_BIB6) 2001; 28 Krauthammer (10.1016/S1532-0464(03)00014-5_BIB21) 2000; 259 10.1016/S1532-0464(03)00014-5_BIB7 Hahn (10.1016/S1532-0464(03)00014-5_BIB3) 2002; 7 Friedman (10.1016/S1532-0464(03)00014-5_BIB24) 2001; 1 Raychaudhuri (10.1016/S1532-0464(03)00014-5_BIB4) 2002; 12 10.1016/S1532-0464(03)00014-5_BIB25 Bikel (10.1016/S1532-0464(03)00014-5_BIB18) 1999; 34 10.1016/S1532-0464(03)00014-5_BIB22 Hirschman (10.1016/S1532-0464(03)00014-5_BIB10) 1998; 12 10.1016/S1532-0464(03)00014-5_BIB20 Chang (10.1016/S1532-0464(03)00014-5_BIB5) 2001; 6 10.1016/S1532-0464(03)00014-5_BIB28 10.1016/S1532-0464(03)00014-5_BIB27
References_xml	– volume: 19 start-page: 135 year: 2003 end-page: 143 ident: BIB23 article-title: Protein structures and information extraction from biological texts: the PASTA system publication-title: Bioinformatics – volume: 40 start-page: 532 year: 2001 end-page: 551 ident: BIB2 article-title: Transparent access to multiple bioinformatics information sources publication-title: IBM Syst. J. – start-page: 13 year: 1995 end-page: 31 ident: BIB11 article-title: Overview of the results of the MUC-6 evaluation publication-title: Proceedings of the Sixth Message Understanding Conference – reference: Nobata C, Collier N, Tsujii J. Comparison between tagged corpora for the named entity task. In: Proceedings of ACL 2000 Workshop on Comparing Corpora; 2000. p. 20–7 – volume: 3 start-page: 154 year: 2002 end-page: 165 ident: BIB1 article-title: Information extraction in molecular biology publication-title: Briefings in Bioinformatics – volume: 12 start-page: 281 year: 1998 end-page: 305 ident: BIB10 article-title: The evolution of evaluation: lessons from the message understanding conferences publication-title: Comput. Speech and Language – reference: Collier N, Nobata C, Tsujii J. Extracting the names of genes and gene products with a Hidden Markov model. In: Proceedings of COLING ’2000; 2000. p. 201–7 – reference: Chinchor N, Marsh E. Message Understanding Conference Proceedings: MUC-7, 1998. Available at – reference: Fukumoto J, Masui F, Shimcheta M, Saski M. Description of the Oki System as used for MUC-7, 1998. Available at – reference: Mikheev A, Grover C, Moens M. Description of the LTG System used for MUC-7, 1998. Available at – volume: 6 start-page: 374 year: 2001 end-page: 383 ident: BIB5 article-title: Including biological literature improves homology search publication-title: Pacific Symp. Biocomputing – volume: 7 start-page: 338 year: 2002 end-page: 349 ident: BIB3 article-title: Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system publication-title: Pacific Symp. Biocomputing – reference: Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology; 1999. p. 77–86 – reference: Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein–protein interactions. Int Conf Intell Syst Mol Biol 1999:60–7 – volume: 12 start-page: 203 year: 2002 end-page: 214 ident: BIB4 article-title: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature publication-title: Genome Res. – volume: 34 start-page: 211 year: 1999 end-page: 231 ident: BIB18 article-title: An algorithm that learns what’s in a name publication-title: Machine Learning, Special Issue on Natural Language Learning – volume: 18 start-page: 1124 year: 2002 end-page: 1132 ident: BIB26 article-title: Tagging gene and protein names in biomedical text publication-title: Bioinformatics – reference: C. Aone, L. Halverson, T. Hampton, M. Ramos-Santacruz, SRA: description of the IE2_System used for MUC-7, 1998. Available on-line at – reference: Stevenson M, Gaizauskas R. Using corpus-derived name lists for named entity recognition. In: Proceedings of the Applied Natural Language Processing and the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-2000); 2000. p. 290–5 – volume: 17 start-page: S97 year: 2001 end-page: S106 ident: BIB29 article-title: Disambiguating proteins, genes, and RNA in text: a machine learning approach publication-title: Bioinformatics – volume: 259 start-page: 245 year: 2000 end-page: 252 ident: BIB21 article-title: Using BLAST for identifying gene and protein names in journal articles publication-title: Gene – volume: 18 start-page: S249 year: 2002 end-page: S257 ident: BIB8 article-title: Of truth and pathways: chasing bits of information through myriads of articles publication-title: Bioinformatics – volume: 1 start-page: 74 year: 2001 end-page: 82 ident: BIB24 article-title: GENIES: a natural language processing system for the extraction of molecular pathways from journal articles publication-title: Bioinformatics Suppl. – reference: MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7), Defense Advanced Research Projects Agency, 1998. Available at – reference: Ohta T, Tateishi Y, Collier N, Nobata C, Tsujii J. Building an annotated corpus from biology research papers. In: Proceedings of COLING 2000 Workshop on Semantic Annotation and Intelligent Content; 2000. p. 28–34 – volume: 28 start-page: 9 year: 2001 end-page: 10 ident: BIB6 article-title: Linking microarray data to the literature publication-title: Nat. Genet. – reference: Proux D, Rechenmann F, Julliard L, Pillet V, Jacq B, Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Proceedings of the 9th Workshop on Genome Informatics; 1998. p. 72–80 – reference: Yu S, Bai S, Wu P. Description of the Kent Ridge Digital Labs System used for MUC-7, 1998. Available on-line at – volume: 7 start-page: 275 year: 2001 end-page: 300 ident: BIB9 article-title: Natural language question answering: the view from here publication-title: Nat. Language Eng. – volume: 3 start-page: 705 year: 1998 end-page: 716 ident: BIB19 article-title: Toward information extraction: identifying protein names from biological papers publication-title: Pacific Symp. Biocomputing – volume: 28 start-page: 9 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB6 article-title: Linking microarray data to the literature publication-title: Nat. Genet. doi: 10.1038/ng0501-9 – volume: 40 start-page: 532 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB2 article-title: Transparent access to multiple bioinformatics information sources publication-title: IBM Syst. J. doi: 10.1147/sj.402.0532 – volume: 1 start-page: 74 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB24 article-title: GENIES: a natural language processing system for the extraction of molecular pathways from journal articles publication-title: Bioinformatics Suppl. doi: 10.1093/bioinformatics/17.suppl_1.S74 – ident: 10.1016/S1532-0464(03)00014-5_BIB28 doi: 10.3115/1117729.1117733 – ident: 10.1016/S1532-0464(03)00014-5_BIB15 – ident: 10.1016/S1532-0464(03)00014-5_BIB13 – volume: 6 start-page: 374 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB5 article-title: Including biological literature improves homology search publication-title: Pacific Symp. Biocomputing – volume: 7 start-page: 275 issue: 4 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB9 article-title: Natural language question answering: the view from here publication-title: Nat. Language Eng. doi: 10.1017/S1351324901002807 – ident: 10.1016/S1532-0464(03)00014-5_BIB17 – volume: 3 start-page: 705 year: 1998 ident: 10.1016/S1532-0464(03)00014-5_BIB19 article-title: Toward information extraction: identifying protein names from biological papers publication-title: Pacific Symp. Biocomputing – volume: 18 start-page: S249 year: 2002 ident: 10.1016/S1532-0464(03)00014-5_BIB8 article-title: Of truth and pathways: chasing bits of information through myriads of articles publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.suppl_1.S249 – ident: 10.1016/S1532-0464(03)00014-5_BIB27 – volume: 259 start-page: 245 year: 2000 ident: 10.1016/S1532-0464(03)00014-5_BIB21 article-title: Using BLAST for identifying gene and protein names in journal articles publication-title: Gene doi: 10.1016/S0378-1119(00)00431-5 – volume: 12 start-page: 281 year: 1998 ident: 10.1016/S1532-0464(03)00014-5_BIB10 article-title: The evolution of evaluation: lessons from the message understanding conferences publication-title: Comput. Speech and Language doi: 10.1006/csla.1998.0102 – ident: 10.1016/S1532-0464(03)00014-5_BIB25 – start-page: 13 year: 1995 ident: 10.1016/S1532-0464(03)00014-5_BIB11 article-title: Overview of the results of the MUC-6 evaluation – ident: 10.1016/S1532-0464(03)00014-5_BIB30 doi: 10.3115/974147.974187 – ident: 10.1016/S1532-0464(03)00014-5_BIB20 – volume: 7 start-page: 338 year: 2002 ident: 10.1016/S1532-0464(03)00014-5_BIB3 article-title: Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system publication-title: Pacific Symp. Biocomputing – volume: 12 start-page: 203 year: 2002 ident: 10.1016/S1532-0464(03)00014-5_BIB4 article-title: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature publication-title: Genome Res. doi: 10.1101/gr.199701 – volume: 19 start-page: 135 year: 2003 ident: 10.1016/S1532-0464(03)00014-5_BIB23 article-title: Protein structures and information extraction from biological texts: the PASTA system publication-title: Bioinformatics doi: 10.1093/bioinformatics/19.1.135 – volume: 17 start-page: S97 year: 2001 ident: 10.1016/S1532-0464(03)00014-5_BIB29 article-title: Disambiguating proteins, genes, and RNA in text: a machine learning approach publication-title: Bioinformatics doi: 10.1093/bioinformatics/17.suppl_1.S97 – ident: 10.1016/S1532-0464(03)00014-5_BIB7 – ident: 10.1016/S1532-0464(03)00014-5_BIB16 – ident: 10.1016/S1532-0464(03)00014-5_BIB14 – ident: 10.1016/S1532-0464(03)00014-5_BIB22 doi: 10.3115/990820.990850 – volume: 18 start-page: 1124 year: 2002 ident: 10.1016/S1532-0464(03)00014-5_BIB26 article-title: Tagging gene and protein names in biomedical text publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.8.1124 – ident: 10.1016/S1532-0464(03)00014-5_BIB12 – volume: 3 start-page: 154 year: 2002 ident: 10.1016/S1532-0464(03)00014-5_BIB1 article-title: Information extraction in molecular biology publication-title: Briefings in Bioinformatics doi: 10.1093/bib/3.2.154 – volume: 34 start-page: 211 year: 1999 ident: 10.1016/S1532-0464(03)00014-5_BIB18 article-title: An algorithm that learns what’s in a name publication-title: Machine Learning, Special Issue on Natural Language Learning
SSID	ssj0011556
Score	1.9121035
SecondaryResourceType	review_article
Snippet	As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists...
SourceID	proquest pubmed crossref elsevier
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	247
SubjectTerms	Abstracting and Indexing as Topic Biology - methods Database Management Systems Databases, Factual Dictionaries as Topic Information Storage and Retrieval - methods Internet Names Natural Language Processing Software Subject Headings Terminology as Topic User-Computer Interface Vocabulary, Controlled
Title	Rutabaga by any other name: extracting biological names
URI	https://dx.doi.org/10.1016/S1532-0464(03)00014-5 https://www.ncbi.nlm.nih.gov/pubmed/12755519 https://www.proquest.com/docview/18706289 https://www.proquest.com/docview/72891387
Volume	35
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEB5EQRQRXV_rMwcPeqjbNknbeFNRVkUPPmBvIUkTEbSK7h68-NudpN1VD4vgsemkhEzy5UvnBbDrSqpdijdVjugX4U7UkY4dXlwpdaXQwrhgaL-6zrr37KLHexNwMoyF8W6VDfbXmB7QumnpNLPZeX187NwmvqYByxguU0_0faA5Y7nPn3_wOXLzQMITKrh6Ye_GyL6jeOovhMa9mO6Hj0R83Pk0jn-Gc-hsAeYbAkmO6jEuwoStWjD7I61gC6avGoN5C-bq33KkjjZagvxm0FdaPSiiPwjiAAkBWKRSz_aQIE6HmKnqgdS5mbwCw7v3Zbg7O7076UZN7YTIIOPpR7RwSMRSl6iMUmqQxRnDrM6d88kAMqoQ6Izl1nBTxqVIHdIkkSei1Cy1nNEVmKxeKrsGpFAZiom4TJllKrEo4mzBmDB5oriwbWDDCZOmySvuy1s8yZEDmZ9n6edZxjRYu5nkbTgYdXutE2v81aEYakP-WiESwf-vrjtD7UncPd4koir7MniXiTfz4p1zvESeekNukbdhtVb792jTnCPfFOv_H9gGzITqMsGhcBMm-28Du4Ukp6-3wyrehqmj88vuNT6d946_ACJx89c
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT-MwEB6xILEghKC8ytOHPcAhNIntJOaGEKi7UA5QJG6W7dgICQKC9sCF387YSQt7qJC4OmPLmhmPP2deAH9cSbVL8aXK0fpFeBJ1pGOHD1dKXSm0MC442nuXWfeG_bvlt1NwMsqF8WGVje2vbXqw1s1Ip-Fm5_n-vnOd-J4GLGOoph7o818wwzjNvWofvo_jPBDxhBauntrHMbLPNJ56iTC4H9ODsErEJ11QkwBouIjOlmCxQZDkuN7kMkzZqgXzX-oKtmC213jMW7BQ_5cjdbrRCuRXw4HS6k4R_UbQEJCQgUUq9WiPCBrqkDRV3ZG6OJOXYPj2ugr9s9P-STdqmidEBiHPIKKFQySWukRllFKDMM4YZnXunK8GkFGFls5Ybg03ZVyK1CFOEnkiSs1Syxldg-nqqbIbQAqVIZmIy5RZphKLJM4WjAmTJ4oL2wY2Ypg0TWFx39_iQY4jyDyfpeezjGlwdzPJ23A4nvZcV9b4bkIxkob8T0UkWv_vpu6NpCfx-HifiKrs0_BVJt7Pi4_OyRR56j25Rd6G9Vrsn7tNc46AU2z-fGN78Lvb713Ii7-X51swF1rNhOjCbZgevAztDiKegd4NGv0BGMH0aQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Rutabaga+by+any+other+name%3A+extracting+biological+names&rft.jtitle=Journal+of+biomedical+informatics&rft.au=Hirschman%2C+Lynette&rft.au=Morgan%2C+Alexander+A.&rft.au=Yeh%2C+Alexander+S.&rft.date=2002-08-01&rft.pub=Elsevier+Inc&rft.issn=1532-0464&rft.eissn=1532-0480&rft.volume=35&rft.issue=4&rft.spage=247&rft.epage=259&rft_id=info:doi/10.1016%2FS1532-0464%2803%2900014-5&rft.externalDocID=S1532046403000145
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1532-0464&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1532-0464&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1532-0464&client=summon