An Efficient Text Summarization Using Term and Inverse Frequency With Key Phrase Identification in Malayalam Language

Malayalam is a morphologically rich language. Indian languages have several language genres like Indo-Aryan, Sino-Tibetan, and Dravidian languages, where Malayalam comes under the Dravidian language genres. Text summarization in Indian languages is hard because of their rich content and lack of easy...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE) pp. 145 - 148
Main Authors Haroon, Rosna P, M, Abdul Gafur, Ali, Nasreen, U, Barakkath Nisha
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Malayalam is a morphologically rich language. Indian languages have several language genres like Indo-Aryan, Sino-Tibetan, and Dravidian languages, where Malayalam comes under the Dravidian language genres. Text summarization in Indian languages is hard because of their rich content and lack of easy availability of annotated data. Here we propose a summarization system for Malayalam language documents based on the Term Frequency - Inverse Document Frequency (TF-IDF) scheme. Our summarizer will accept a single Malayalam text document as input, then by using the TF-IDF measurement scheme with keyword identification a summary document is generated. The proposed method successfully summarized Malayalam literature documents with 90.6% accuracy.
DOI:10.1109/WIECON-ECE54711.2021.9829671