Rutabaga by any other name: extracting biological names

As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the gr...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 35; no. 4; pp. 247 - 259
Main Authors Hirschman, Lynette, Morgan, Alexander A., Yeh, Alexander S.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.08.2002
Subjects
Online AccessGet full text
ISSN1532-0464
1532-0480
DOI10.1016/S1532-0464(03)00014-5

Cover

Loading…
Abstract As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
AbstractList As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
Author Yeh, Alexander S.
Hirschman, Lynette
Morgan, Alexander A.
Author_xml – sequence: 1
  givenname: Lynette
  surname: Hirschman
  fullname: Hirschman, Lynette
  email: lynette@mitre.org
– sequence: 2
  givenname: Alexander A.
  surname: Morgan
  fullname: Morgan, Alexander A.
– sequence: 3
  givenname: Alexander S.
  surname: Yeh
  fullname: Yeh, Alexander S.
BackLink https://www.ncbi.nlm.nih.gov/pubmed/12755519$$D View this record in MEDLINE/PubMed
BookMark eNqFkUlPwzAQhS0EorTwE0A5ITgEvMRZ4IBQxSZVQoLeLWcyKUZpXGwX0X9PutADl55mNPO9Gem9PtlvbYuEnDJ6xShLr9-ZFDymSZpcUHFJKWVJLPfI0Wac0_1tnyY90vf-s2OYlOkh6TGeSSlZcUSyt3nQpZ7oqFxEul1ENnygi1o9xZsIf4LTEEw7iUpjGzsxoJvVzh-Tg1o3Hk82dUDGjw_j4XM8en16Gd6PYhAFD7HIa5lJXjOdCiGACwaQYJnVNVQ5pEJLkQNKBAkVrQpeF7woMlZUZcJRJmJAztdnZ85-zdEHNTUesGl0i3buVcbzgok82wmyPKNpB3fg2Qacl1Os1MyZqXYL9WdJB9yuAXDWe4e1AhN0MLbtzDCNYlQtA1CrANTSXUWFWgWgZKeW_9TbBzt0d2sddmZ-G3TKg8EWsDIOIajKmh0XfgFcZ5tE
CitedBy_id crossref_primary_10_1186_1471_2105_6_S1_S5
crossref_primary_10_1186_1471_2105_6_S1_S2
crossref_primary_10_1016_j_tibtech_2006_10_002
crossref_primary_10_1021_acs_chemrev_6b00851
crossref_primary_10_1186_2041_1480_2_1
crossref_primary_10_1093_bib_bbn043
crossref_primary_10_1016_j_jbi_2011_10_004
crossref_primary_10_1142_S0219720004000399
crossref_primary_10_1186_1756_0381_5_13
crossref_primary_10_1016_j_cell_2008_06_029
crossref_primary_10_1016_j_jbi_2004_08_010
crossref_primary_10_1371_journal_pcbi_1000411
crossref_primary_10_1016_j_jbi_2003_10_001
crossref_primary_10_1186_s12859_015_0487_2
crossref_primary_10_1108_00220411211200301
crossref_primary_10_1186_1471_2105_7_372
crossref_primary_10_1186_1471_2105_12_S8_S5
crossref_primary_10_1093_bioinformatics_btx815
crossref_primary_10_1007_s10791_008_9072_x
crossref_primary_10_1109_TITB_2005_856857
crossref_primary_10_1002_cfg_459
crossref_primary_10_1016_j_compbiolchem_2004_09_010
crossref_primary_10_1093_database_baac039
crossref_primary_10_1186_1471_2105_6_103
crossref_primary_10_1186_1472_6947_12_36
crossref_primary_10_1093_bib_bbr018
crossref_primary_10_1109_ACCESS_2019_2932842
crossref_primary_10_1016_j_patter_2021_100328
crossref_primary_10_1074_jbc_R110_176370
crossref_primary_10_1093_database_bas042
crossref_primary_10_3390_fi11090185
crossref_primary_10_1007_s10257_014_0259_y
crossref_primary_10_1186_1471_2105_6_S1_S15
crossref_primary_10_1186_gb_2008_9_s2_s13
crossref_primary_10_1186_1471_2105_7_220
crossref_primary_10_1016_j_jbi_2004_08_004
crossref_primary_10_1093_bioinformatics_bti296
crossref_primary_10_1186_1471_2105_8_S9_S5
crossref_primary_10_1186_1471_2105_6_88
crossref_primary_10_1186_1472_6947_5_35
crossref_primary_10_3389_fdgth_2022_1065581
crossref_primary_10_1093_bioinformatics_bti733
Cites_doi 10.1038/ng0501-9
10.1147/sj.402.0532
10.1093/bioinformatics/17.suppl_1.S74
10.3115/1117729.1117733
10.1017/S1351324901002807
10.1093/bioinformatics/18.suppl_1.S249
10.1016/S0378-1119(00)00431-5
10.1006/csla.1998.0102
10.3115/974147.974187
10.1101/gr.199701
10.1093/bioinformatics/19.1.135
10.1093/bioinformatics/17.suppl_1.S97
10.3115/990820.990850
10.1093/bioinformatics/18.8.1124
10.1093/bib/3.2.154
ContentType Journal Article
Copyright 2003 Elsevier Science (USA)
Copyright_xml – notice: 2003 Elsevier Science (USA)
DBID 6I.
AAFTH
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
8FD
FR3
P64
7X8
DOI 10.1016/S1532-0464(03)00014-5
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Engineering Research Database
Biotechnology Research Abstracts
Technology Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList
Engineering Research Database
MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Engineering
Public Health
EISSN 1532-0480
EndPage 259
ExternalDocumentID 12755519
10_1016_S1532_0464_03_00014_5
S1532046403000145
Genre Journal Article
GroupedDBID ---
--K
--M
-~X
.DC
.GJ
.~1
0R~
1B1
1RT
1~.
1~5
29J
4.4
457
4G.
53G
5GY
5VS
6I.
7-5
71M
8P~
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAWTL
AAXUO
AAYFN
ABBOA
ABBQC
ABFRF
ABJNI
ABLVK
ABMAC
ABMZM
ABVKL
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADMUD
AEBSH
AEFWE
AEKER
AENEX
AEXQZ
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGYEJ
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AJRQY
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ANZVX
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BAWUL
BKOJK
BLXMC
BNPGV
CAG
COF
CS3
DIK
DM4
DU5
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HVGLF
HZ~
IHE
IXB
J1W
KOM
LCYCR
LG5
M41
MO0
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSH
SSV
SSZ
T5K
UAP
UHS
UNMZH
XPP
ZGI
ZMT
ZU3
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACIEU
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
0SF
CGR
CUY
CVF
ECM
EIF
NPM
7QO
8FD
EFKBS
FR3
P64
7X8
ID FETCH-LOGICAL-c392t-38f5752f1a6333c231cc4eb7ffcd8c63a538ce5ec5cd0d92f9299719db42e543
IEDL.DBID .~1
ISSN 1532-0464
IngestDate Fri Sep 05 14:05:33 EDT 2025
Fri Sep 05 08:26:09 EDT 2025
Wed Feb 19 02:41:05 EST 2025
Tue Jul 01 04:11:49 EDT 2025
Thu Apr 24 23:01:04 EDT 2025
Fri Feb 23 02:33:46 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License http://www.elsevier.com/open-access/userlicense/1.0
https://www.elsevier.com/tdm/userlicense/1.0
https://www.elsevier.com/open-access/userlicense/1.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c392t-38f5752f1a6333c231cc4eb7ffcd8c63a538ce5ec5cd0d92f9299719db42e543
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
OpenAccessLink https://www.sciencedirect.com/science/article/pii/S1532046403000145
PMID 12755519
PQID 18706289
PQPubID 23462
PageCount 13
ParticipantIDs proquest_miscellaneous_72891387
proquest_miscellaneous_18706289
pubmed_primary_12755519
crossref_citationtrail_10_1016_S1532_0464_03_00014_5
crossref_primary_10_1016_S1532_0464_03_00014_5
elsevier_sciencedirect_doi_10_1016_S1532_0464_03_00014_5
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2002-08-01
PublicationDateYYYYMMDD 2002-08-01
PublicationDate_xml – month: 08
  year: 2002
  text: 2002-08-01
  day: 01
PublicationDecade 2000
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of biomedical informatics
PublicationTitleAlternate J Biomed Inform
PublicationYear 2002
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Mikheev A, Grover C, Moens M. Description of the LTG System used for MUC-7, 1998. Available at
Nobata C, Collier N, Tsujii J. Comparison between tagged corpora for the named entity task. In: Proceedings of ACL 2000 Workshop on Comparing Corpora; 2000. p. 20–7
Raychaudhuri, Chang, Sutphin, Altman (BIB4) 2002; 12
C. Aone, L. Halverson, T. Hampton, M. Ramos-Santacruz, SRA: description of the IE2_System used for MUC-7, 1998. Available on-line at
Proux D, Rechenmann F, Julliard L, Pillet V, Jacq B, Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Proceedings of the 9th Workshop on Genome Informatics; 1998. p. 72–80
Gaizauskas, Demetriou, Artymiuk, Willett (BIB23) 2003; 19
Yu S, Bai S, Wu P. Description of the Kent Ridge Digital Labs System used for MUC-7, 1998. Available on-line at
Krauthammer, Rzhetsky, Morosov, Friedman (BIB21) 2000; 259
Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology; 1999. p. 77–86
Masys (BIB6) 2001; 28
Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein–protein interactions. Int Conf Intell Syst Mol Biol 1999:60–7
Stevenson M, Gaizauskas R. Using corpus-derived name lists for named entity recognition. In: Proceedings of the Applied Natural Language Processing and the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-2000); 2000. p. 290–5
Hahn, Romacker, Schulz (BIB3) 2002; 7
Chinchor N, Marsh E. Message Understanding Conference Proceedings: MUC-7, 1998. Available at
MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7), Defense Advanced Research Projects Agency, 1998. Available at
Ohta T, Tateishi Y, Collier N, Nobata C, Tsujii J. Building an annotated corpus from biology research papers. In: Proceedings of COLING 2000 Workshop on Semantic Annotation and Intelligent Content; 2000. p. 28–34
Hatzivassiloglou, Duboue, Rzhetsky (BIB29) 2001; 17
Blaschke, Hirschman, Valencia (BIB1) 2002; 3
Goble, Stevens, Ng, Bechhofer, Paton, Baker, Peim, Brass (BIB2) 2001; 40
Hirschman (BIB10) 1998; 12
Fukuda, Tsunoda, Tamura, Takagi (BIB19) 1998; 3
Fukumoto J, Masui F, Shimcheta M, Saski M. Description of the Oki System as used for MUC-7, 1998. Available at
Sundheim (BIB11) 1995
Tanabe, Wilbur (BIB26) 2002; 18
Chang, Raychaudhuri, Altman (BIB5) 2001; 6
Bikel, Schwartz, Weischedel (BIB18) 1999; 34
Friedman, Kra, Yu, Krauthammer, Rzhetsky (BIB24) 2001; 1
Krauthammer, Kra, Iossifov, Gomez, Hripcsak, Hatzivassiloglou, Friedman, Rzhetsky (BIB8) 2002; 18
Hirschman, Gaizauskas (BIB9) 2001; 7
Collier N, Nobata C, Tsujii J. Extracting the names of genes and gene products with a Hidden Markov model. In: Proceedings of COLING ’2000; 2000. p. 201–7
Krauthammer (10.1016/S1532-0464(03)00014-5_BIB8) 2002; 18
Goble (10.1016/S1532-0464(03)00014-5_BIB2) 2001; 40
Fukuda (10.1016/S1532-0464(03)00014-5_BIB19) 1998; 3
Hatzivassiloglou (10.1016/S1532-0464(03)00014-5_BIB29) 2001; 17
Sundheim (10.1016/S1532-0464(03)00014-5_BIB11) 1995
Tanabe (10.1016/S1532-0464(03)00014-5_BIB26) 2002; 18
10.1016/S1532-0464(03)00014-5_BIB14
10.1016/S1532-0464(03)00014-5_BIB13
Hirschman (10.1016/S1532-0464(03)00014-5_BIB9) 2001; 7
10.1016/S1532-0464(03)00014-5_BIB12
Blaschke (10.1016/S1532-0464(03)00014-5_BIB1) 2002; 3
Gaizauskas (10.1016/S1532-0464(03)00014-5_BIB23) 2003; 19
10.1016/S1532-0464(03)00014-5_BIB30
10.1016/S1532-0464(03)00014-5_BIB17
10.1016/S1532-0464(03)00014-5_BIB16
10.1016/S1532-0464(03)00014-5_BIB15
Masys (10.1016/S1532-0464(03)00014-5_BIB6) 2001; 28
Krauthammer (10.1016/S1532-0464(03)00014-5_BIB21) 2000; 259
10.1016/S1532-0464(03)00014-5_BIB7
Hahn (10.1016/S1532-0464(03)00014-5_BIB3) 2002; 7
Friedman (10.1016/S1532-0464(03)00014-5_BIB24) 2001; 1
Raychaudhuri (10.1016/S1532-0464(03)00014-5_BIB4) 2002; 12
10.1016/S1532-0464(03)00014-5_BIB25
Bikel (10.1016/S1532-0464(03)00014-5_BIB18) 1999; 34
10.1016/S1532-0464(03)00014-5_BIB22
Hirschman (10.1016/S1532-0464(03)00014-5_BIB10) 1998; 12
10.1016/S1532-0464(03)00014-5_BIB20
Chang (10.1016/S1532-0464(03)00014-5_BIB5) 2001; 6
10.1016/S1532-0464(03)00014-5_BIB28
10.1016/S1532-0464(03)00014-5_BIB27
References_xml – volume: 19
  start-page: 135
  year: 2003
  end-page: 143
  ident: BIB23
  article-title: Protein structures and information extraction from biological texts: the PASTA system
  publication-title: Bioinformatics
– volume: 40
  start-page: 532
  year: 2001
  end-page: 551
  ident: BIB2
  article-title: Transparent access to multiple bioinformatics information sources
  publication-title: IBM Syst. J.
– start-page: 13
  year: 1995
  end-page: 31
  ident: BIB11
  article-title: Overview of the results of the MUC-6 evaluation
  publication-title: Proceedings of the Sixth Message Understanding Conference
– reference: Nobata C, Collier N, Tsujii J. Comparison between tagged corpora for the named entity task. In: Proceedings of ACL 2000 Workshop on Comparing Corpora; 2000. p. 20–7
– volume: 3
  start-page: 154
  year: 2002
  end-page: 165
  ident: BIB1
  article-title: Information extraction in molecular biology
  publication-title: Briefings in Bioinformatics
– volume: 12
  start-page: 281
  year: 1998
  end-page: 305
  ident: BIB10
  article-title: The evolution of evaluation: lessons from the message understanding conferences
  publication-title: Comput. Speech and Language
– reference: Collier N, Nobata C, Tsujii J. Extracting the names of genes and gene products with a Hidden Markov model. In: Proceedings of COLING ’2000; 2000. p. 201–7
– reference: Chinchor N, Marsh E. Message Understanding Conference Proceedings: MUC-7, 1998. Available at
– reference: Fukumoto J, Masui F, Shimcheta M, Saski M. Description of the Oki System as used for MUC-7, 1998. Available at
– reference: Mikheev A, Grover C, Moens M. Description of the LTG System used for MUC-7, 1998. Available at
– volume: 6
  start-page: 374
  year: 2001
  end-page: 383
  ident: BIB5
  article-title: Including biological literature improves homology search
  publication-title: Pacific Symp. Biocomputing
– volume: 7
  start-page: 338
  year: 2002
  end-page: 349
  ident: BIB3
  article-title: Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system
  publication-title: Pacific Symp. Biocomputing
– reference: Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology; 1999. p. 77–86
– reference: Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein–protein interactions. Int Conf Intell Syst Mol Biol 1999:60–7
– volume: 12
  start-page: 203
  year: 2002
  end-page: 214
  ident: BIB4
  article-title: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature
  publication-title: Genome Res.
– volume: 34
  start-page: 211
  year: 1999
  end-page: 231
  ident: BIB18
  article-title: An algorithm that learns what’s in a name
  publication-title: Machine Learning, Special Issue on Natural Language Learning
– volume: 18
  start-page: 1124
  year: 2002
  end-page: 1132
  ident: BIB26
  article-title: Tagging gene and protein names in biomedical text
  publication-title: Bioinformatics
– reference: C. Aone, L. Halverson, T. Hampton, M. Ramos-Santacruz, SRA: description of the IE2_System used for MUC-7, 1998. Available on-line at
– reference: Stevenson M, Gaizauskas R. Using corpus-derived name lists for named entity recognition. In: Proceedings of the Applied Natural Language Processing and the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-2000); 2000. p. 290–5
– volume: 17
  start-page: S97
  year: 2001
  end-page: S106
  ident: BIB29
  article-title: Disambiguating proteins, genes, and RNA in text: a machine learning approach
  publication-title: Bioinformatics
– volume: 259
  start-page: 245
  year: 2000
  end-page: 252
  ident: BIB21
  article-title: Using BLAST for identifying gene and protein names in journal articles
  publication-title: Gene
– volume: 18
  start-page: S249
  year: 2002
  end-page: S257
  ident: BIB8
  article-title: Of truth and pathways: chasing bits of information through myriads of articles
  publication-title: Bioinformatics
– volume: 1
  start-page: 74
  year: 2001
  end-page: 82
  ident: BIB24
  article-title: GENIES: a natural language processing system for the extraction of molecular pathways from journal articles
  publication-title: Bioinformatics Suppl.
– reference: MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7), Defense Advanced Research Projects Agency, 1998. Available at
– reference: Ohta T, Tateishi Y, Collier N, Nobata C, Tsujii J. Building an annotated corpus from biology research papers. In: Proceedings of COLING 2000 Workshop on Semantic Annotation and Intelligent Content; 2000. p. 28–34
– volume: 28
  start-page: 9
  year: 2001
  end-page: 10
  ident: BIB6
  article-title: Linking microarray data to the literature
  publication-title: Nat. Genet.
– reference: Proux D, Rechenmann F, Julliard L, Pillet V, Jacq B, Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction. In: Proceedings of the 9th Workshop on Genome Informatics; 1998. p. 72–80
– reference: Yu S, Bai S, Wu P. Description of the Kent Ridge Digital Labs System used for MUC-7, 1998. Available on-line at
– volume: 7
  start-page: 275
  year: 2001
  end-page: 300
  ident: BIB9
  article-title: Natural language question answering: the view from here
  publication-title: Nat. Language Eng.
– volume: 3
  start-page: 705
  year: 1998
  end-page: 716
  ident: BIB19
  article-title: Toward information extraction: identifying protein names from biological papers
  publication-title: Pacific Symp. Biocomputing
– volume: 28
  start-page: 9
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB6
  article-title: Linking microarray data to the literature
  publication-title: Nat. Genet.
  doi: 10.1038/ng0501-9
– volume: 40
  start-page: 532
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB2
  article-title: Transparent access to multiple bioinformatics information sources
  publication-title: IBM Syst. J.
  doi: 10.1147/sj.402.0532
– volume: 1
  start-page: 74
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB24
  article-title: GENIES: a natural language processing system for the extraction of molecular pathways from journal articles
  publication-title: Bioinformatics Suppl.
  doi: 10.1093/bioinformatics/17.suppl_1.S74
– ident: 10.1016/S1532-0464(03)00014-5_BIB28
  doi: 10.3115/1117729.1117733
– ident: 10.1016/S1532-0464(03)00014-5_BIB15
– ident: 10.1016/S1532-0464(03)00014-5_BIB13
– volume: 6
  start-page: 374
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB5
  article-title: Including biological literature improves homology search
  publication-title: Pacific Symp. Biocomputing
– volume: 7
  start-page: 275
  issue: 4
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB9
  article-title: Natural language question answering: the view from here
  publication-title: Nat. Language Eng.
  doi: 10.1017/S1351324901002807
– ident: 10.1016/S1532-0464(03)00014-5_BIB17
– volume: 3
  start-page: 705
  year: 1998
  ident: 10.1016/S1532-0464(03)00014-5_BIB19
  article-title: Toward information extraction: identifying protein names from biological papers
  publication-title: Pacific Symp. Biocomputing
– volume: 18
  start-page: S249
  year: 2002
  ident: 10.1016/S1532-0464(03)00014-5_BIB8
  article-title: Of truth and pathways: chasing bits of information through myriads of articles
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.suppl_1.S249
– ident: 10.1016/S1532-0464(03)00014-5_BIB27
– volume: 259
  start-page: 245
  year: 2000
  ident: 10.1016/S1532-0464(03)00014-5_BIB21
  article-title: Using BLAST for identifying gene and protein names in journal articles
  publication-title: Gene
  doi: 10.1016/S0378-1119(00)00431-5
– volume: 12
  start-page: 281
  year: 1998
  ident: 10.1016/S1532-0464(03)00014-5_BIB10
  article-title: The evolution of evaluation: lessons from the message understanding conferences
  publication-title: Comput. Speech and Language
  doi: 10.1006/csla.1998.0102
– ident: 10.1016/S1532-0464(03)00014-5_BIB25
– start-page: 13
  year: 1995
  ident: 10.1016/S1532-0464(03)00014-5_BIB11
  article-title: Overview of the results of the MUC-6 evaluation
– ident: 10.1016/S1532-0464(03)00014-5_BIB30
  doi: 10.3115/974147.974187
– ident: 10.1016/S1532-0464(03)00014-5_BIB20
– volume: 7
  start-page: 338
  year: 2002
  ident: 10.1016/S1532-0464(03)00014-5_BIB3
  article-title: Creating knowledge repositories from biomedical reports: the MEDSYNDIKATE text mining system
  publication-title: Pacific Symp. Biocomputing
– volume: 12
  start-page: 203
  year: 2002
  ident: 10.1016/S1532-0464(03)00014-5_BIB4
  article-title: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature
  publication-title: Genome Res.
  doi: 10.1101/gr.199701
– volume: 19
  start-page: 135
  year: 2003
  ident: 10.1016/S1532-0464(03)00014-5_BIB23
  article-title: Protein structures and information extraction from biological texts: the PASTA system
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/19.1.135
– volume: 17
  start-page: S97
  year: 2001
  ident: 10.1016/S1532-0464(03)00014-5_BIB29
  article-title: Disambiguating proteins, genes, and RNA in text: a machine learning approach
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/17.suppl_1.S97
– ident: 10.1016/S1532-0464(03)00014-5_BIB7
– ident: 10.1016/S1532-0464(03)00014-5_BIB16
– ident: 10.1016/S1532-0464(03)00014-5_BIB14
– ident: 10.1016/S1532-0464(03)00014-5_BIB22
  doi: 10.3115/990820.990850
– volume: 18
  start-page: 1124
  year: 2002
  ident: 10.1016/S1532-0464(03)00014-5_BIB26
  article-title: Tagging gene and protein names in biomedical text
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.8.1124
– ident: 10.1016/S1532-0464(03)00014-5_BIB12
– volume: 3
  start-page: 154
  year: 2002
  ident: 10.1016/S1532-0464(03)00014-5_BIB1
  article-title: Information extraction in molecular biology
  publication-title: Briefings in Bioinformatics
  doi: 10.1093/bib/3.2.154
– volume: 34
  start-page: 211
  year: 1999
  ident: 10.1016/S1532-0464(03)00014-5_BIB18
  article-title: An algorithm that learns what’s in a name
  publication-title: Machine Learning, Special Issue on Natural Language Learning
SSID ssj0011556
Score 1.9121035
SecondaryResourceType review_article
Snippet As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists...
SourceID proquest
pubmed
crossref
elsevier
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 247
SubjectTerms Abstracting and Indexing as Topic
Biology - methods
Database Management Systems
Databases, Factual
Dictionaries as Topic
Information Storage and Retrieval - methods
Internet
Names
Natural Language Processing
Software
Subject Headings
Terminology as Topic
User-Computer Interface
Vocabulary, Controlled
Title Rutabaga by any other name: extracting biological names
URI https://dx.doi.org/10.1016/S1532-0464(03)00014-5
https://www.ncbi.nlm.nih.gov/pubmed/12755519
https://www.proquest.com/docview/18706289
https://www.proquest.com/docview/72891387
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEB5EQRQRXV_rMwcPeqjbNknbeFNRVkUPPmBvIUkTEbSK7h68-NudpN1VD4vgsemkhEzy5UvnBbDrSqpdijdVjugX4U7UkY4dXlwpdaXQwrhgaL-6zrr37KLHexNwMoyF8W6VDfbXmB7QumnpNLPZeX187NwmvqYByxguU0_0faA5Y7nPn3_wOXLzQMITKrh6Ye_GyL6jeOovhMa9mO6Hj0R83Pk0jn-Gc-hsAeYbAkmO6jEuwoStWjD7I61gC6avGoN5C-bq33KkjjZagvxm0FdaPSiiPwjiAAkBWKRSz_aQIE6HmKnqgdS5mbwCw7v3Zbg7O7076UZN7YTIIOPpR7RwSMRSl6iMUmqQxRnDrM6d88kAMqoQ6Izl1nBTxqVIHdIkkSei1Cy1nNEVmKxeKrsGpFAZiom4TJllKrEo4mzBmDB5oriwbWDDCZOmySvuy1s8yZEDmZ9n6edZxjRYu5nkbTgYdXutE2v81aEYakP-WiESwf-vrjtD7UncPd4koir7MniXiTfz4p1zvESeekNukbdhtVb792jTnCPfFOv_H9gGzITqMsGhcBMm-28Du4Ukp6-3wyrehqmj88vuNT6d946_ACJx89c
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT-MwEB6xILEghKC8ytOHPcAhNIntJOaGEKi7UA5QJG6W7dgICQKC9sCF387YSQt7qJC4OmPLmhmPP2deAH9cSbVL8aXK0fpFeBJ1pGOHD1dKXSm0MC442nuXWfeG_bvlt1NwMsqF8WGVje2vbXqw1s1Ip-Fm5_n-vnOd-J4GLGOoph7o818wwzjNvWofvo_jPBDxhBauntrHMbLPNJ56iTC4H9ODsErEJ11QkwBouIjOlmCxQZDkuN7kMkzZqgXzX-oKtmC213jMW7BQ_5cjdbrRCuRXw4HS6k4R_UbQEJCQgUUq9WiPCBrqkDRV3ZG6OJOXYPj2ugr9s9P-STdqmidEBiHPIKKFQySWukRllFKDMM4YZnXunK8GkFGFls5Ybg03ZVyK1CFOEnkiSs1Syxldg-nqqbIbQAqVIZmIy5RZphKLJM4WjAmTJ4oL2wY2Ypg0TWFx39_iQY4jyDyfpeezjGlwdzPJ23A4nvZcV9b4bkIxkob8T0UkWv_vpu6NpCfx-HifiKrs0_BVJt7Pi4_OyRR56j25Rd6G9Vrsn7tNc46AU2z-fGN78Lvb713Ii7-X51swF1rNhOjCbZgevAztDiKegd4NGv0BGMH0aQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Rutabaga+by+any+other+name%3A+extracting+biological+names&rft.jtitle=Journal+of+biomedical+informatics&rft.au=Hirschman%2C+Lynette&rft.au=Morgan%2C+Alexander+A.&rft.au=Yeh%2C+Alexander+S.&rft.date=2002-08-01&rft.pub=Elsevier+Inc&rft.issn=1532-0464&rft.eissn=1532-0480&rft.volume=35&rft.issue=4&rft.spage=247&rft.epage=259&rft_id=info:doi/10.1016%2FS1532-0464%2803%2900014-5&rft.externalDocID=S1532046403000145
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1532-0464&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1532-0464&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1532-0464&client=summon