Multilingual emoji prediction using BERT for sentiment analysis

Purpose Recently, Unicode has been standardized with the penetration of social networking services, the use of emojis has become common. Emojis, as they are also known, are most effective in expressing emotions in sentences. Sentiment analysis in natural language processing manually labels emotions...

Full description

Saved in:

Bibliographic Details
Published in	International journal of Web information systems Vol. 16; no. 3; pp. 265 - 280
Main Authors	Tomihira, Toshiki, Otsuka, Atsushi, Yamashita, Akihiro, Satoh, Tetsuji
Format	Journal Article
Language	English
Published	Bingley Emerald Group Publishing Limited 09.10.2020
Subjects	Accuracy Application programming interface Artificial neural networks Attention Coders Context Data mining Datasets Digital media Emojis Emotional icons Emotions English language Japanese language Labels Mass media Miscommunication Natural language processing Neural networks Principal components analysis Sentences Sentiment analysis Short term memory Social media Social networks Social research Transformers Word meaning Words (language) Writers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Purpose Recently, Unicode has been standardized with the penetration of social networking services, the use of emojis has become common. Emojis, as they are also known, are most effective in expressing emotions in sentences. Sentiment analysis in natural language processing manually labels emotions for sentences. The authors can predict sentiment using emoji of text posted on social media without labeling manually. The purpose of this paper is to propose a new model that learns from sentences using emojis as labels, collecting English and Japanese tweets from Twitter as the corpus. The authors verify and compare multiple models based on attention long short-term memory (LSTM) and convolutional neural networks (CNN) and Bidirectional Encoder Representations from Transformers (BERT). Design/methodology/approach The authors collected 2,661 kinds of emoji registered as Unicode characters from tweets using Twitter application programming interface. It is a total of 6,149,410 tweets in Japanese. First, the authors visualized a vector space produced by the emojis by Word2Vec. In addition, the authors found that emojis and similar meaning words of emojis are adjacent and verify that emoji can be used for sentiment analysis. Second, it involves entering a line of tweets containing emojis, learning and testing with that emoji as a label. The authors compared the BERT model with the conventional models [CNN, FastText and Attention bidirectional long short-term memory (BiLSTM)] that were high scores in the previous study. Findings Visualized the vector space of Word2Vec, the authors found that emojis and similar meaning words of emojis are adjacent and verify that emoji can be used for sentiment analysis. The authors obtained a higher score with BERT models compared to the conventional model. Therefore, the sophisticated experiments demonstrate that they improved the score over the conventional model in two languages. General emoji prediction is greatly influenced by context. In addition, the score may be lowered due to a misunderstanding of meaning. By using BERT based on a bi-directional transformer, the authors can consider the context. Practical implications The authors can find emoji in the output words by typing a word using an input method editor (IME). The current IME only considers the most latest inputted word, although it is possible to recommend emojis considering the context of the inputted sentence in this study. Therefore, the research can be used to improve IME performance in the future. Originality/value In the paper, the authors focus on multilingual emoji prediction. This is the first attempt of comparison at emoji prediction between Japanese and English. In addition, it is also the first attempt to use the BERT model based on the transformer for predicting limited emojis although the transformer is known to be effective for various NLP tasks. The authors found that a bidirectional transformer is suitable for emoji prediction.
ISSN:	1744-0084 1744-0092
DOI:	10.1108/IJWIS-09-2019-0042