Indian News Headlines Classification using Word Embedding Techniques and LSTM Model

Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researcher...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 218; pp. 899 - 907
Main Authors	Khuntia, Madhusmita, Gupta, Deepa
Format	Journal Article
Language	English
Published	Elsevier B.V 2023
Subjects	BiLSTM LSTM Multi-label classification News headlines Word Embeddings News headlines LSTM BiLSTM Multi-label classification Word Embeddings
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researchers. Creating a dense vector from short texts has become a challenging and essential task in many applications such as recommender systems, context analysis, decision making, text classification, etc. This work not only targeted creating a classification model for the short text but also categorized the headlines with the ‘unknown’ category. Our work uses Bidirectional Encoder Representations from Transformers (BERT), cosine similarity index, word embedding, and Long Short-Term Memory (LSTM) network to classify news headlines in multiple categories. Our proposed method outperforms labeling the unlabeled data with the help of a BERT sentence encoder. The system uses LSTM to learn the headlines as input vectors and classify the headline text by the classifier. At the end of this experiment, the designed pipeline achieves remarkable precision at the class level.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2023.01.070