Computing semantic similarity between biomedical concepts using new information content approach
Excerpt of MeSH taxonomy modeling the Information Contents (ICs) of the biomedical concepts “Antigens, Differentiation, T-Lymphocyte” (C1) and “Receptors, Interleukin-7” (C2). The IC is quantified using the ancestors’ subgraph and the topological parameters (depth and descendants) of each ancestor....
Saved in:
Published in | Journal of biomedical informatics Vol. 59; pp. 258 - 275 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.02.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Excerpt of MeSH taxonomy modeling the Information Contents (ICs) of the biomedical concepts “Antigens, Differentiation, T-Lymphocyte” (C1) and “Receptors, Interleukin-7” (C2). The IC is quantified using the ancestors’ subgraph and the topological parameters (depth and descendants) of each ancestor. The lowest common subsumer (LCS) is the concept “Antigens, Differentiation”. The semantic similarity is computed using the IC(C1), IC(C2) and IC(LCS). [Display omitted]
•Study of MeSH taxonomy topological parameters and their semantic interpretations.•New information content computed method based on ancestors’ subgraph from taxonomy.•Experimentations are done on 3 biomedical benchmarks for semantic similarity.•It outperforms the known IC-based measures of semantic similarity.
The exploitation of heterogeneous clinical sources and healthcare records is fundamental in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing and structuring of textual resources. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies such as MeSH. This study focuses on Information Content (IC) based measures that exploit the topological parameters of the taxonomy to express the semantics of a concept. A new intrinsic IC computing method based on the taxonomical parameters of the ancestors’ subgraph is then assigned to a biomedical concept into the “is a” hierarchy. Moreover, we present a study of the topological parameters through the MeSH taxonomy. This study treats the semantic interpretation and the different ways of expressing the parameters of depth and the descendants’ subgraph. Using MeSH as an input ontology, the accuracy of our proposal is evaluated and compared against other IC-based measures according to several widely-used benchmarks of biomedical terms. The correlation between the results obtained for the evaluated measure using the proposed approach and those from the ratings of human’ experts shows that our proposal outperforms the previous measures. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2015.12.007 |