Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts

The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource lan...

Full description

Saved in:
Bibliographic Details
Published inData in brief Vol. 55; p. 110760
Main Authors Faisal, Moshiur Rahman, Shifa, Ashrin Mobashira, Rahman, Md Hasibur, Uddin, Mohammed Arif, Rahman, Rashedur M.
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier Inc 01.08.2024
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource languages like Bengali, compounded by the emergence of Banglish. To address this gap, we present “Bengali & Banglish,” an extensive dataset comprising 80,098 labelled samples across six emotion classes. Our dataset fills a void in fine-grained emotion classification for Bengali and pioneers in emotion detection in Banglish. We achieve significant performance metrics through meticulous annotation and rigorous evaluation, including a weighted F1 score of 71.30% for Bengali and 64.59% for Banglish using BanglaBERT. Also, our dataset facilitates Bengali-to-Banglish Machine Translation, contributing to the advancement of language processing models. Furthermore, our dataset demonstrates a high Cohen's Kappa score of 93.5%, affirming the reliability and consistency of our annotations. This research underscores the importance of linguistic diversity in NLP and provides a valuable resource for enhancing Emotion Detection capabilities in Bengali and Banglish across digital platforms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2352-3409
2352-3409
DOI:10.1016/j.dib.2024.110760