Malay phoneme-based subword news headline generator for low-resource language
The booming of technology has significantly increased the amount of news articles for readers. The headline of news plays an essential role in attracting readers. Traditionally, crafting the news headline is a manual task at the news desk. The motivation of this paper is to address the issues faced...
Saved in:
Published in | IAES international journal of artificial intelligence Vol. 13; no. 4; p. 4965 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
01.12.2024
|
Online Access | Get full text |
Cover
Loading…
Summary: | The booming of technology has significantly increased the amount of news articles for readers. The headline of news plays an essential role in attracting readers. Traditionally, crafting the news headline is a manual task at the news desk. The motivation of this paper is to address the issues faced in low resource languages, such as the Malay language. The main contribution of this paper is a new hybrid model based on extractive- and abstractive-based text summarization with the integration of a geographical linguistics model; a Malay phoneme-based subword embedding has been developed to solve the complex morphological issue in the Malay language-based computational linguistic applications. The experiment involves various sequence-to sequence (seq2seq) models to generate the Malay news headlines. Besides that, the out-of-vocabulary (OOV) is assessed in the models. From the experiment, the proposed hybrid text summarization model shows significant improvement over the baseline models above 11.00 in ROUGE-1, 4.00 ROUGE-2, and 11.00 in ROUGE-L. The proposed model can reduce the OOV rate to below 15%. |
---|---|
ISSN: | 2089-4872 2252-8938 |
DOI: | 10.11591/ijai.v13.i4.pp4965-4975 |