Malay phoneme-based subword news headline generator for low-resource language

The booming of technology has significantly increased the amount of news articles for readers. The headline of news plays an essential role in attracting readers. Traditionally, crafting the news headline is a manual task at the news desk. The motivation of this paper is to address the issues faced...

Full description

Saved in:
Bibliographic Details
Published inIAES international journal of artificial intelligence Vol. 13; no. 4; p. 4965
Main Authors Tsann Phua, Yeong, Hooi Yew, Kwang, Fadzil Hassan, Mohd, Yok Wooi, Matthew Teow
Format Journal Article
LanguageEnglish
Published 01.12.2024
Online AccessGet full text

Cover

Loading…
More Information
Summary:The booming of technology has significantly increased the amount of news articles for readers. The headline of news plays an essential role in attracting readers. Traditionally, crafting the news headline is a manual task at the news desk. The motivation of this paper is to address the issues faced in low resource languages, such as the Malay language. The main contribution of this paper is a new hybrid model based on extractive- and abstractive-based text summarization with the integration of a geographical linguistics model; a Malay phoneme-based subword embedding has been developed to solve the complex morphological issue in the Malay language-based computational linguistic applications. The experiment involves various sequence-to sequence (seq2seq) models to generate the Malay news headlines. Besides that, the out-of-vocabulary (OOV) is assessed in the models. From the experiment, the proposed hybrid text summarization model shows significant improvement over the baseline models above 11.00 in ROUGE-1, 4.00 ROUGE-2, and 11.00 in ROUGE-L. The proposed model can reduce the OOV rate to below 15%.
ISSN:2089-4872
2252-8938
DOI:10.11591/ijai.v13.i4.pp4965-4975