Extractive text summarisation using Bayesian state estimation of sentences: A Markovian framework

Identifying and extracting valuable information from textual documents in the form of cohesively and appropriately developed summaries is one of the most challenging tasks in text mining and natural language processing. In this article, we present a sequential Markov model, equipped with Bayesian in...

Full description

Saved in:
Bibliographic Details
Published inJournal of information science Vol. 50; no. 4; pp. 1005 - 1018
Main Authors Ghanbari Haez, Saba, Shamsfakhr, Farhad
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 01.08.2024
Bowker-Saur Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identifying and extracting valuable information from textual documents in the form of cohesively and appropriately developed summaries is one of the most challenging tasks in text mining and natural language processing. In this article, we present a sequential Markov model, equipped with Bayesian inference, to estimate the degree of importance of sentences in a document and thereby address the text summarisation problem. The proposed methodology models the extractive sentence summarisation as a Bayesian state estimation problem, where the system state is the importance degree of each sentence in a document. The transition and observation models are derived using a nonlinear dynamical system identification based on a recurrent feedback neural model that predicts the sentence observation using the sentence input data. In the end, the transition and observation probability density functions are modelled using a mixture density network. The performance assessment of the system has been carried out by investigating the optimal feature dimensionality and the impact of the model parameters on the system accuracy, using entropy-based risk and loss-based risk measures. Finally, the superiority of the proposed methodology over the state of the art in extractive summarisation is discussed and verified by reporting the recall, precision and accuracy on the real-world benchmark data sets.
ISSN:0165-5515
1741-6485
DOI:10.1177/01655515221112842