Indian News Headlines Classification using Word Embedding Techniques and LSTM Model

Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researcher...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 218; pp. 899 - 907
Main Authors Khuntia, Madhusmita, Gupta, Deepa
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researchers. Creating a dense vector from short texts has become a challenging and essential task in many applications such as recommender systems, context analysis, decision making, text classification, etc. This work not only targeted creating a classification model for the short text but also categorized the headlines with the ‘unknown’ category. Our work uses Bidirectional Encoder Representations from Transformers (BERT), cosine similarity index, word embedding, and Long Short-Term Memory (LSTM) network to classify news headlines in multiple categories. Our proposed method outperforms labeling the unlabeled data with the help of a BERT sentence encoder. The system uses LSTM to learn the headlines as input vectors and classify the headline text by the classifier. At the end of this experiment, the designed pipeline achieves remarkable precision at the class level.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2023.01.070