Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Hall, Melissa, Chern, Bobbie, Gustafson, Laura, Ventura, Denisse, Kulkarni, Harshad, Ross, Candace, Usunier, Nicolas
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 16.02.2023
Subjects	Ablation Assessments Classifiers Computer vision Data collection Datasets Demographics Image classification Performance measurement Qualitative analysis Reliability analysis Task complexity
Online Access	Get full text

Cover

Loading…

Abstract	Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models.
AbstractList	Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models. However, there is a lack of discussion on the vulnerabilities of these measurements for more complex computer vision tasks. In this paper, we consider multi-label image classification and, specifically, object categorization tasks. First, we highlight design choices and trade-offs for measurement that involve more nuance than discussed in prior computer vision literature. These challenges are related to the necessary scale of data, definition of groups for images, choice of metric, and dataset imbalances. Next, through two case studies using modern vision models, we demonstrate that naive implementations of these assessments are brittle. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments, both in terms of magnitude and direction (on which group the classifiers work best) of disparities. Based on ablation studies, we propose some recommendations to increase the reliability of these assessments. Finally, through a qualitative analysis we find that concepts with large disparities tend to have varying definitions and representations between groups, with inconsistencies across datasets and annotators. While this result suggests avenues for mitigation through more consistent data collection, it also highlights that ambiguous label definitions remain a challenge when performing model assessments. Vision models are expanding and becoming more ubiquitous; it is even more important that our disparity assessments accurately reflect the true performance of models.
Author	Usunier, Nicolas Ventura, Denisse Chern, Bobbie Ross, Candace Hall, Melissa Kulkarni, Harshad Gustafson, Laura
Author_xml	– sequence: 1 givenname: Melissa surname: Hall fullname: Hall, Melissa – sequence: 2 givenname: Bobbie surname: Chern fullname: Chern, Bobbie – sequence: 3 givenname: Laura surname: Gustafson fullname: Gustafson, Laura – sequence: 4 givenname: Denisse surname: Ventura fullname: Ventura, Denisse – sequence: 5 givenname: Harshad surname: Kulkarni fullname: Kulkarni, Harshad – sequence: 6 givenname: Candace surname: Ross fullname: Ross, Candace – sequence: 7 givenname: Nicolas surname: Usunier fullname: Usunier, Nicolas
BookMark	eNqNyr0KwjAUQOEgCv6-wwXnQk1aq6O0ioIu0sGtpHqrV9Kk5rb4-jr4AE5n-M5Y9K2z2BMjqdQiWEVSDsWM-RmGoVwmMo7VSFxy99b-xnBGQ7o0CBtmZK7Rtgyuggxrd_e6edAVMuJGe2oJGcjCqTMtBUddooFDre8IqdHMVBF6nopBpQ3j7NeJmO-2eboPGu9eHXJbPF3n7ZcKmSSrhYrWy7X67_oAHEtDyw
ContentType	Paper
Copyright	2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS
DatabaseName	ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection
DatabaseTitle	Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Physics
EISSN	2331-8422
Genre	Working Paper/Pre-Print
GroupedDBID	8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS
ID	FETCH-proquest_journals_27781349693
IEDL.DBID	BENPR
IngestDate	Thu Oct 10 16:20:06 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-proquest_journals_27781349693
OpenAccessLink	https://www.proquest.com/docview/2778134969?pq-origsite=%requestingapplication%
PQID	2778134969
PQPubID	2050157
ParticipantIDs	proquest_journals_2778134969
PublicationCentury	2000
PublicationDate	20230216
PublicationDateYYYYMMDD	2023-02-16
PublicationDate_xml	– month: 02 year: 2023 text: 20230216 day: 16
PublicationDecade	2020
PublicationPlace	Ithaca
PublicationPlace_xml	– name: Ithaca
PublicationTitle	arXiv.org
PublicationYear	2023
Publisher	Cornell University Library, arXiv.org
Publisher_xml	– name: Cornell University Library, arXiv.org
SSID	ssj0002672553
Score	3.4578073
SecondaryResourceType	preprint
Snippet	Disaggregated performance metrics across demographic groups are a hallmark of fairness assessments in computer vision. These metrics successfully incentivized...
SourceID	proquest
SourceType	Aggregation Database
SubjectTerms	Ablation Assessments Classifiers Computer vision Data collection Datasets Demographics Image classification Performance measurement Qualitative analysis Reliability analysis Task complexity
Title	Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers
URI	https://www.proquest.com/docview/2778134969
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB5sF8GbT7TWEtBrcJ_ZzUnU7lrFliIV9laSTQIL9mG3Xv3tZtZtPQg9hkBIQvgmM_PNfAA3ATNamUTQSEhFw4S7VCQRp5FSxshACK7QURyO2OA9fMmjvAm4VQ2tcoOJNVCrRYEx8ls_jhNspcf43fKTomoUZlcbCY0WOL4XYprWeUhH47dtlMVnsf0zB_-AtrYe2SE4Y7HUqyPY0_Nj2K9Jl0V1Avmk5qxWBHnBWMJE7rd9MiuyMKSvZ78dpcuC9MsKBQOx_ykp56QunKWvQuoP8jyzoEBqecvSoLT1KVxn6eRxQDfbmTZPppr-HTA4g7b1_fU5EC5d5QWMKxdFomNXeJG07q91EqRhIS8uoLtrpc7u6Us4QPV0JCF7rAvt9epLX1kbu5Y9aCXZU6-5Tjsafqc_4eCIkg
link.rule.ids	786,790,12792,21416,33406,33777,43633,43838
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NSwMxEB20RfTmJ35UDeg1uG12s5uTiHXd6rZ4qNBbSTYJLPTLbv3_ZuK2HoSeAyEJw5vM5OU9gHvGrdE2kTSSStMwEQGVSSRopLW1ikkpNBaK_QHPPsO3UTSqG25VTatcY6IHaj0vsEf-0InjBKX0uHhcfFF0jcLX1dpCYxeaIeMM4zxJXzc9lg6P3Y2Z_YNZnzvSQ2h-yIVZHsGOmR3DnqdcFtUJjIaesVoRZAXjBybytFHJrMjckq6Z_upJlwXplhXaBaL6KSlnxH-bpblUZkJ6UwcJxJtblhaNrU_hLn0ZPmd0vZxxHTDV-G977AwarvI350CECnSbcaEDtIiOA9mOlCt-XYmgLA9FcQGtbTNdbh--hf1s2M_HeW_wfgUH6KOOdOQ2b0Fjtfw21y7brtSNP9IfsfeIAg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Reliable+Assessments+of+Demographic+Disparities+in+Multi-Label+Image+Classifiers&rft.jtitle=arXiv.org&rft.au=Hall%2C+Melissa&rft.au=Chern%2C+Bobbie&rft.au=Gustafson%2C+Laura&rft.au=Ventura%2C+Denisse&rft.date=2023-02-16&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422