Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis

•A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers’ perception.•A harmful news identification dataset is established and conduct a correlation analysis.•A BERT-based model which applies ensemble learning methods with a text senti...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 59; no. 2; p. 102872
Main Authors Lin, Szu-Yin, Kung, Yun-Ching, Leu, Fang-Yie
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 01.03.2022
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers’ perception.•A harmful news identification dataset is established and conduct a correlation analysis.•A BERT-based model which applies ensemble learning methods with a text sentiment analysis is proposed to identify harmful news.•Results show that the F1-score of the proposed model reaches 66.3%, an increase of 7.8% compared with that of the previous approach. In an environment full of disordered information, the media spreads fake or harmful information into the public arena with a speed which is faster than ever before. A news report should ideally be neutral and factual. Excessive personal emotions or viewpoints should not be included. News articles ought not to be intentionally or maliciously written or create a media framing. A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers’ perception. However, in the current situation, it is difficult to effectively identify and predict fake or harmful news in advance, especially harmful news. Therefore, in this study, we propose a Bidirectional Encoder Representation from Transformers (BERT) based model which applies ensemble learning methods with a text sentiment analysis to identify harmful news, aiming to provide readers with a way to identify harmful news content so as to help them to judge whether the information provided is in a more neutral manner. The working model of the proposed system has two phases. The first phase is collecting harmful news and establishing a development model for analyzing the correlation between text sentiment and harmful news. The second phase is identifying harmful news by analyzing text sentiment with an ensemble learning technique and the BERT model. The purpose is to determine whether the news has harmful intentions. Our experimental results show that the F1-score of the proposed model reaches 66.3%, an increase of 7.8% compared with that of the previous term frequency-inverse document frequency approach which adopts a Lagrangian Support Vector Machine (LSVM) model without using a text sentiment. Moreover, the proposed method achieves a better performance in recognizing various cases of information disorder.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2022.102872