A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

A significant amount ofavailable informationisstored in textual databases which containsa large collection ofdocuments fromdifferent sources(such as news, articles,books,emails andweb pages). The increasing visibility andimportance of this class of information motivates us to work on having better a...

Full description

Saved in:
Bibliographic Details
Published inPizhūhishnāmah-i pardāzish va mudiriyyat-i iṭṭilāʻāt (Online) Vol. 33; no. 2; pp. 885 - 914
Main Authors Tayyebeh Hosseinikhah, Abbas Ahmadi, Azadeh Mohebi
Format Journal Article
LanguagePersian
Published Iranian Research Institute for Information and Technology 01.03.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A significant amount ofavailable informationisstored in textual databases which containsa large collection ofdocuments fromdifferent sources(such as news, articles,books,emails andweb pages). The increasing visibility andimportance of this class of information motivates us to work on having better automatic evaluation tools for textualresources. The automatic summarization of text is one of the ways to prevent the waste of users' time. The extractive textsummarization consists of the extraction of the more important sentences with the purpose of shortening input text whilemaintainingthe topics covered and the subjects discussed. In this paper, we have tried to improve the accuracy of the extracted summaries by combining natural languageprocessing and text mining techniques. By modifying the mentioned algorithms and sentence scoring measures, accuracy isincreased as compared to the previously used techniques. Part of speech tagging is used for calculating coefficient of words' importance. Using this approach will in turn helpus with to pick the more meaningful words and phrases that will result in better accuracy of the system. Graph similarity's methods are used to select sentences.Changing weight of the selected sentences in each step leads tosolve the redundancy problem. Standard evaluation measures such as "Precision" and "Recall" are used to evaluate results based on a Persian corpus.
ISSN:2251-8223
2251-8231