Automatic extraction of gene/protein biological functions from biomedical text

Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amoun...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 21; no. 7; pp. 1227 - 1236
Main Authors Koike, Asako, Niwa, Yoshiki, Takagi, Toshihisa
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.04.2005
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. Results: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54–64% with a precision of 91–94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene–GO relationships and 150 000 family–GO relationships for major eukaryotes. Availability: The extracted gene functions are available at http://prime.ontology.ims.u-tokyo.ac.jp Contact: akoike@hgc.jp
Bibliography:local:bti084
To whom correspondence should be addressed.
istex:00A444650B4DA1DE224D6E1AD5281A07FB21D038
ark:/67375/HXZ-G09VGCQN-R
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti084