Computing semantic similarity between biomedical concepts using new information content approach

Excerpt of MeSH taxonomy modeling the Information Contents (ICs) of the biomedical concepts “Antigens, Differentiation, T-Lymphocyte” (C1) and “Receptors, Interleukin-7” (C2). The IC is quantified using the ancestors’ subgraph and the topological parameters (depth and descendants) of each ancestor....

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 59; pp. 258 - 275
Main Authors Ben Aouicha, Mohamed, Hadj Taieb, Mohamed Ali
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.02.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Excerpt of MeSH taxonomy modeling the Information Contents (ICs) of the biomedical concepts “Antigens, Differentiation, T-Lymphocyte” (C1) and “Receptors, Interleukin-7” (C2). The IC is quantified using the ancestors’ subgraph and the topological parameters (depth and descendants) of each ancestor. The lowest common subsumer (LCS) is the concept “Antigens, Differentiation”. The semantic similarity is computed using the IC(C1), IC(C2) and IC(LCS). [Display omitted] •Study of MeSH taxonomy topological parameters and their semantic interpretations.•New information content computed method based on ancestors’ subgraph from taxonomy.•Experimentations are done on 3 biomedical benchmarks for semantic similarity.•It outperforms the known IC-based measures of semantic similarity. The exploitation of heterogeneous clinical sources and healthcare records is fundamental in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing and structuring of textual resources. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies such as MeSH. This study focuses on Information Content (IC) based measures that exploit the topological parameters of the taxonomy to express the semantics of a concept. A new intrinsic IC computing method based on the taxonomical parameters of the ancestors’ subgraph is then assigned to a biomedical concept into the “is a” hierarchy. Moreover, we present a study of the topological parameters through the MeSH taxonomy. This study treats the semantic interpretation and the different ways of expressing the parameters of depth and the descendants’ subgraph. Using MeSH as an input ontology, the accuracy of our proposal is evaluated and compared against other IC-based measures according to several widely-used benchmarks of biomedical terms. The correlation between the results obtained for the evaluated measure using the proposed approach and those from the ratings of human’ experts shows that our proposal outperforms the previous measures.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2015.12.007