BioNorm: deep learning-based event normalization for the curation of reaction databases

Abstract Motivation A biochemical reaction, bio-event, depicts the relationships between participating entities. Current text mining research has been focusing on identifying bio-events from scientific literature. However, rare efforts have been dedicated to normalize bio-events extracted from scien...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 36; no. 2; pp. 611 - 620
Main Authors Lou, Peiliang, Jimeno Yepes, Antonio, Zhang, Zai, Zheng, Qinghua, Zhang, Xiangrong, Li, Chen
Format Journal Article
LanguageEnglish
Published England Oxford University Press 15.01.2020
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Motivation A biochemical reaction, bio-event, depicts the relationships between participating entities. Current text mining research has been focusing on identifying bio-events from scientific literature. However, rare efforts have been dedicated to normalize bio-events extracted from scientific literature with the entries in the curated reaction databases, which could disambiguate the events and further support interconnecting events into biologically meaningful and complete networks. Results In this paper, we propose BioNorm, a novel method of normalizing bio-events extracted from scientific literature to entries in the bio-molecular reaction database, e.g. IntAct. BioNorm considers event normalization as a paraphrase identification problem. It represents an entry as a natural language statement by combining multiple types of information contained in it. Then, it predicts the semantic similarity between the natural language statement and the statements mentioning events in scientific literature using a long short-term memory recurrent neural network (LSTM). An event will be normalized to the entry if the two statements are paraphrase. To the best of our knowledge, this is the first attempt of event normalization in the biomedical text mining. The experiments have been conducted using the molecular interaction data from IntAct. The results demonstrate that the method could achieve F-score of 0.87 in normalizing event-containing statements. Availability and implementation The source code is available at the gitlab repository https://gitlab.com/BioAI/leen and BioASQvec Plus is available on figshare https://figshare.com/s/45896c31d10c3f6d857a.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btz571