Extraction of protein-protein interactions using natural language processing based pattern matching

A significant part of our knowledge is relationships between two terms. However, most of these information is documented as unstructured text in various forms, like books, online articles and webpages. Extract those information and store them in a structured database could help people utilize these...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 1292 - 1295
Main Authors Kaixian Yu, Tingting Zhao, Peixiang Zhao, Jinfeng Zhang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A significant part of our knowledge is relationships between two terms. However, most of these information is documented as unstructured text in various forms, like books, online articles and webpages. Extract those information and store them in a structured database could help people utilize these information more conveniently. In this study, we proposed a novel approach to extract the relationships information based on Nature Language Processing (NLP) and graph theoretic algorithm. Our method, Grammatical Relationship Graph for Triplets (GRGT), extracts three layers of information: the pairs of terms that have certain relationship, exactly what type of the relationship is, and what direct this relationship is. GRGT works on a grammatical graph obtained by parsed the sentence using Natural Language Processing. Patterns were extracted from the graph by shortest path among the words of interests. We have designed a decision tree to make the pattern matching. GRGT was applied to extract the protein-protein-interactions (PPIs) from biomedical literature, and obtained better precision than the best performing method in literature. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities.
DOI:10.1109/BIBM.2017.8217847