MCRWR: a new method to measure the similarity of documents based on semantic network

Besides Boolean retrieval with medical subject headings (MeSH), PubMed provides users with an alternative way called "Related Articles" to access and collect relevant documents based on semantic similarity. To explore the functionality more efficiently and more accurately, we proposed an i...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 23; no. 1; pp. 56 - 17
Main Authors Pan, Xianwei, Huang, Peng, Li, Shan, Cui, Lei
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 01.02.2022
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Besides Boolean retrieval with medical subject headings (MeSH), PubMed provides users with an alternative way called "Related Articles" to access and collect relevant documents based on semantic similarity. To explore the functionality more efficiently and more accurately, we proposed an improved algorithm by measuring the semantic similarity of PubMed citations based on the MeSH-concept network model. Three article similarity networks are obtained using MeSH-concept random walk with restart (MCRWR), MeSH random walk with restart (MRWR) and PubMed related article (PMRA) respectively. The area under receiver operating characteristic (ROC) curve of MCRWR, MRWR and PMRA is 0.93, 0.90, and 0.67 respectively. Precisions of MCRWR and MRWR under various similarity thresholds are higher than that of PMRA. Mean value of P5 of MCRWR is 0.742, which is much higher than those of MRWR (0.692) and PMRA (0.223). In the article semantic similarity network of "Genes & Function of organ & Disease" based on MCRWR algorithm, four topics are identified according to golden standards. MeSH-concept random walk with restart algorithm has better performance in constructing article semantic similarity network, which can reveal the implicitly semantic association between documents. The efficiency and accuracy of retrieving semantic-related documents have been improved a lot.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-022-04578-1