Towards Evidence-based Precision Medicine: Extracting Population Information from Biomedical Text using Binary Classifiers and Syntactic Patterns

Precision Medicine is an emerging approach for prevention and treatment of disease that considers individual variability in genes, environment, and lifestyle for each person. The dissemination of individualized evidence by automatically identifying population information in literature is a key for e...

Full description

Saved in:
Bibliographic Details
Published inAMIA Summits on Translational Science proceedings Vol. 2016; pp. 203 - 212
Main Authors Raja, Kalpana, Dasot, Naman, Goyal, Pawan, Jonnalagadda, Siddhartha R
Format Journal Article
LanguageEnglish
Published United States American Medical Informatics Association 2016
Online AccessGet full text

Cover

Loading…
More Information
Summary:Precision Medicine is an emerging approach for prevention and treatment of disease that considers individual variability in genes, environment, and lifestyle for each person. The dissemination of individualized evidence by automatically identifying population information in literature is a key for evidence-based precision medicine at the point-of-care. We propose a hybrid approach using natural language processing techniques to automatically extract the population information from biomedical literature. Our approach first implements a binary classifier to classify sentences with or without population information. A rule-based system based on syntactic-tree regular expressions is then applied to sentences containing population information to extract the population named entities. The proposed two-stage approach achieved an F-score of 0.81 using a MaxEnt classifier and the rule- based system, and an F-score of 0.87 using a Nai've-Bayes classifier and the rule-based system, and performed relatively well compared to many existing systems. The system and evaluation dataset is being released as open source.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2153-4063
2153-4063