Hate Speech Classifiers Learn Normative Social Stereotypes

Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 11; pp. 300 - 319
Main Authors Davani, Aida Mostafazadeh, Atari, Mohammad, Kennedy, Brendan, Dehghani, Morteza
Format Journal Article
LanguageEnglish
Published One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA MIT Press 22.03.2023
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text
ISSN2307-387X
2307-387X
DOI10.1162/tacl_a_00550

Cover

Loading…
More Information
Summary:Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00550