Assessing Demographic Bias Transfer from Dataset to Model: A Case Study in Facial Expression Recognition

Proceedings of the Workshop on Artificial Intelligence Safety 2022 (AISafety 2022) The increasing amount of applications of Artificial Intelligence (AI) has led researchers to study the social impact of these technologies and evaluate their fairness. Unfortunately, current fairness metrics are hard...

Full description

Saved in:
Bibliographic Details
Main Authors Dominguez-Catena, Iris, Paternain, Daniel, Galar, Mikel
Format Journal Article
LanguageEnglish
Published 20.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Proceedings of the Workshop on Artificial Intelligence Safety 2022 (AISafety 2022) The increasing amount of applications of Artificial Intelligence (AI) has led researchers to study the social impact of these technologies and evaluate their fairness. Unfortunately, current fairness metrics are hard to apply in multi-class multi-demographic classification problems, such as Facial Expression Recognition (FER). We propose a new set of metrics to approach these problems. Of the three metrics proposed, two focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model. These metrics combined can potentially be used to study and compare diverse bias mitigation methods. We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset. Like many other datasets for FER, Affectnet is a large Internet-sourced dataset with 291,651 labeled images. Obtaining images from the Internet raises some concerns over the fairness of any system trained on this data and its ability to generalize properly to diverse populations. We first analyze the dataset and some variants, finding substantial racial bias and gender stereotypes. We then extract several subsets with different demographic properties and train a model on each one, observing the amount of residual bias in the different setups. We also provide a second analysis on a different dataset, FER+.
DOI:10.48550/arxiv.2205.10049