Rapid pattern development for concept recognition systems: application to point mutations

The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We...

Full description

Saved in:
Bibliographic Details
Published inJournal of bioinformatics and computational biology Vol. 5; no. 6; p. 1233
Main Authors Caporaso, J Gregory, Baumgartner, William A, Randolph, David A, Cohen, K Bretonnel, Hunter, Lawrence
Format Journal Article
LanguageEnglish
Published Singapore 01.12.2007
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:The primary biomedical literature is being generated at an unprecedented rate, and researchers cannot keep abreast of new developments in their fields. Biomedical natural language processing is being developed to address this issue, but building reliable systems often requires many expert-hours. We present an approach for automatically developing collections of regular expressions to drive high-performance concept recognition systems with minimal human interaction. We applied our approach to develop MutationFinder, a system for automatically extracting mentions of point mutations from the text. MutationFinder achieves performance equivalent to or better than manually developed mutation recognition systems, but the generation of its 759 patterns has required only 5.5 expert-hours. We also discuss the development and evaluation of our recently published high-quality, human-annotated gold standard corpus, which contains 1,515 complete point mutation mentions annotated in 813 abstracts. Both MutationFinder and the complete corpus are publicly available at (http://mutationfinder.sourceforge.net/).
ISSN:0219-7200
DOI:10.1142/S0219720007003144