Clustering cliques for graph-based summarization of the biomedical research literature

Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). SemRep is used to extract semantic...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 14; no. 1; p. 182
Main Authors Zhang, Han, Fiszman, Marcelo, Shin, Dongwook, Wilkowski, Bartlomiej, Rindflesch, Thomas C
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 07.06.2013
BioMed Central
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/1471-2105-14-182