Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Hall, Melissa, Chern, Bobbie, Gustafson, Laura, Ventura, Denisse, Kulkarni, Harshad, Ross, Candace, Usunier, Nicolas
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 16.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models.
AbstractList Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models.
Author Usunier, Nicolas
Ventura, Denisse
Chern, Bobbie
Ross, Candace
Hall, Melissa
Kulkarni, Harshad
Gustafson, Laura
Author_xml – sequence: 1
  givenname: Melissa
  surname: Hall
  fullname: Hall, Melissa
– sequence: 2
  givenname: Bobbie
  surname: Chern
  fullname: Chern, Bobbie
– sequence: 3
  givenname: Laura
  surname: Gustafson
  fullname: Gustafson, Laura
– sequence: 4
  givenname: Denisse
  surname: Ventura
  fullname: Ventura, Denisse
– sequence: 5
  givenname: Harshad
  surname: Kulkarni
  fullname: Kulkarni, Harshad
– sequence: 6
  givenname: Candace
  surname: Ross
  fullname: Ross, Candace
– sequence: 7
  givenname: Nicolas
  surname: Usunier
  fullname: Usunier, Nicolas
BookMark eNqNyr0KwjAUQOEgCv6-wwXnQk1aq6O0ioIu0sGtpHqrV9Kk5rb4-jr4AE5n-M5Y9K2z2BMjqdQiWEVSDsWM-RmGoVwmMo7VSFxy99b-xnBGQ7o0CBtmZK7Rtgyuggxrd_e6edAVMuJGe2oJGcjCqTMtBUddooFDre8IqdHMVBF6nopBpQ3j7NeJmO-2eboPGu9eHXJbPF3n7ZcKmSSrhYrWy7X67_oAHEtDyw
ContentType Paper
Copyright 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest One Academic
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-proquest_journals_27781349693
IEDL.DBID BENPR
IngestDate Thu Oct 10 16:20:06 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_27781349693
OpenAccessLink https://www.proquest.com/docview/2778134969?pq-origsite=%requestingapplication%
PQID 2778134969
PQPubID 2050157
ParticipantIDs proquest_journals_2778134969
PublicationCentury 2000
PublicationDate 20230216
PublicationDateYYYYMMDD 2023-02-16
PublicationDate_xml – month: 02
  year: 2023
  text: 20230216
  day: 16
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2023
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 3.4578073
SecondaryResourceType preprint
Snippet Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Ablation
Assessments
Classifiers
Computer vision
Data collection
Datasets
Demographics
Image classification
Performance measurement
Qualitative analysis
Reliability analysis
Task complexity
Title Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers
URI https://www.proquest.com/docview/2778134969
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB5sF8GbT7TWEtBrcJ_ZzUnU7lrFliIV9laSTQIL9mG3Xv3tZtZtPQg9hkBIQvgmM_PNfAA3ATNamUTQSEhFw4S7VCQRp5FSxshACK7QURyO2OA9fMmjvAm4VQ2tcoOJNVCrRYEx8ls_jhNspcf43fKTomoUZlcbCY0WOL4XYprWeUhH47dtlMVnsf0zB_-AtrYe2SE4Y7HUqyPY0_Nj2K9Jl0V1Avmk5qxWBHnBWMJE7rd9MiuyMKSvZ78dpcuC9MsKBQOx_ykp56QunKWvQuoP8jyzoEBqecvSoLT1KVxn6eRxQDfbmTZPppr-HTA4g7b1_fU5EC5d5QWMKxdFomNXeJG07q91EqRhIS8uoLtrpc7u6Us4QPV0JCF7rAvt9epLX1kbu5Y9aCXZU6-5Tjsafqc_4eCIkg
link.rule.ids 786,790,12792,21416,33406,33777,43633,43838
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NSwMxEB20RfTmJ35UDeg1uG12s5uTiHXd6rZ4qNBbSTYJLPTLbv3_ZuK2HoSeAyEJw5vM5OU9gHvGrdE2kTSSStMwEQGVSSRopLW1ikkpNBaK_QHPPsO3UTSqG25VTatcY6IHaj0vsEf-0InjBKX0uHhcfFF0jcLX1dpCYxeaIeMM4zxJXzc9lg6P3Y2Z_YNZnzvSQ2h-yIVZHsGOmR3DnqdcFtUJjIaesVoRZAXjBybytFHJrMjckq6Z_upJlwXplhXaBaL6KSlnxH-bpblUZkJ6UwcJxJtblhaNrU_hLn0ZPmd0vZxxHTDV-G977AwarvI350CECnSbcaEDtIiOA9mOlCt-XYmgLA9FcQGtbTNdbh--hf1s2M_HeW_wfgUH6KOOdOQ2b0Fjtfw21y7brtSNP9IfsfeIAg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Reliable+Assessments+of+Demographic+Disparities+in+Multi-Label+Image+Classifiers&rft.jtitle=arXiv.org&rft.au=Hall%2C+Melissa&rft.au=Chern%2C+Bobbie&rft.au=Gustafson%2C+Laura&rft.au=Ventura%2C+Denisse&rft.date=2023-02-16&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422