Do Not Harm Protected Groups in Debiasing Language Representation Models

Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, fo...

Full description

Saved in:

Bibliographic Details
Main Authors	Zhu, Chloe Qinyu, Stureborg, Rickard, Fain, Brandon
Format	Journal Article
Language	English
Published	27.10.2023
Subjects	Computer Science - Computation and Language Computer Science - Computers and Society
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.
DOI:	10.48550/arxiv.2310.18458