A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

A significant amount ofavailable informationisstored in textual databases which containsa large collection ofdocuments fromdifferent sources(such as news, articles,books,emails andweb pages). The increasing visibility andimportance of this class of information motivates us to work on having better a...

Full description

Saved in:

Bibliographic Details
Published in	Pizhūhishnāmah-i pardāzish va mudiriyyat-i iṭṭilāʻāt (Online) Vol. 33; no. 2; pp. 885 - 914
Main Authors	Tayyebeh Hosseinikhah, Abbas Ahmadi, Azadeh Mohebi
Format	Journal Article
Language	Persian
Published	Iranian Research Institute for Information and Technology 01.03.2018
Subjects	Extractive Summarization Natural Language Processing Part of Speech Tagging Similarity Graph Text Mining
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A significant amount ofavailable informationisstored in textual databases which containsa large collection ofdocuments fromdifferent sources(such as news, articles,books,emails andweb pages). The increasing visibility andimportance of this class of information motivates us to work on having better automatic evaluation tools for textualresources. The automatic summarization of text is one of the ways to prevent the waste of users' time. The extractive textsummarization consists of the extraction of the more important sentences with the purpose of shortening input text whilemaintainingthe topics covered and the subjects discussed. In this paper, we have tried to improve the accuracy of the extracted summaries by combining natural languageprocessing and text mining techniques. By modifying the mentioned algorithms and sentence scoring measures, accuracy isincreased as compared to the previously used techniques. Part of speech tagging is used for calculating coefficient of words' importance. Using this approach will in turn helpus with to pick the more meaningful words and phrases that will result in better accuracy of the system. Graph similarity's methods are used to select sentences.Changing weight of the selected sentences in each step leads tosolve the redundancy problem. Standard evaluation measures such as "Precision" and "Recall" are used to evaluate results based on a Persian corpus.
ISSN:	2251-8223 2251-8231