Bengali News Headline Categorization Using Optimized Machine Learning Pipeline

Bengali text based news portal is now very common and increasing day by day. With easy access of internet technology, reading news through online is now a regular task. Different types of news are represented in the news portal. The system presented in this paper categorizes the news headline of new...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of information engineering and electronic business Vol. 13; no. 1; pp. 15 - 24
Main Authors Dhar, Prashengit, Abedin, Md. Zainal
Format Journal Article
LanguageEnglish
Published Hong Kong Modern Education and Computer Science Press 08.02.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Bengali text based news portal is now very common and increasing day by day. With easy access of internet technology, reading news through online is now a regular task. Different types of news are represented in the news portal. The system presented in this paper categorizes the news headline of news portal or sites. Prediction is made by machine learning algorithm. Large number of collected data are trained and tested. As pre-processing tasks such as tokenization, digit removal, removing punctuation marks, symbols, and deletion of stop words are processed. A set of stop words is also created manually. Strong stop words leads to better performance. Stop words deletion plays a lead role in feature selection. For optimization, genetic algorithm is used which results in reduced feature size. A comparison is also explored without optimization process. Dataset is established by collecting news headline from various Bengali news portal and sites. Resultant output shows well performance in categorization.
ISSN:2074-9023
2074-9031
DOI:10.5815/ijieeb.2021.01.02