BioCreAtIvE task1A: entity identification with a stochastic tagger
Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-ba...
Saved in:
Published in | BMC bioinformatics Vol. 6 Suppl 1; no. S1; p. S4 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
England
BioMed Central Ltd
24.05.2005
BioMed Central BMC |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI.
Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division.
Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
ISSN: | 1471-2105 1471-2105 |
DOI: | 10.1186/1471-2105-6-S1-S4 |