Locating and Mitigating Gender Bias in Large Language Models

Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bi...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Cai, Yuchen, Cao, Ding, Guo, Rongxi, Wen, Yaqin, Liu, Guiquan, Chen, Enhong
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 21.03.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects.
AbstractList Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects.
Author Cai, Yuchen
Liu, Guiquan
Cao, Ding
Chen, Enhong
Wen, Yaqin
Guo, Rongxi
Author_xml – sequence: 1
  givenname: Yuchen
  surname: Cai
  fullname: Cai, Yuchen
– sequence: 2
  givenname: Ding
  surname: Cao
  fullname: Cao, Ding
– sequence: 3
  givenname: Rongxi
  surname: Guo
  fullname: Guo, Rongxi
– sequence: 4
  givenname: Yaqin
  surname: Wen
  fullname: Wen, Yaqin
– sequence: 5
  givenname: Guiquan
  surname: Liu
  fullname: Liu, Guiquan
– sequence: 6
  givenname: Enhong
  surname: Chen
  fullname: Chen, Enhong
BookMark eNqNissKwjAURIMoWLX_EHBdiDf2Ba4UH4t2574Eew0p5Ubz-H8L-gFuZs5wZsXmZAlnLAEpd1m1B1iy1PtBCAFFCXkuE3Zo7EMFQ5or6nlrgtHfeUXq0fGjUZ4b4o1yGqckHdUEre1x9Bu2eKrRY_rrNdtezvfTLXs5-47oQzfY6GhSHdSlhLKuCiH_e30AFEY4eA
ContentType Paper
Copyright 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Engineering Collection
ProQuest Engineering Database
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest One Academic
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-proquest_journals_29732798603
IEDL.DBID 8FG
IngestDate Thu Oct 10 17:40:59 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_29732798603
OpenAccessLink https://www.proquest.com/docview/2973279860?pq-origsite=%requestingapplication%
PQID 2973279860
PQPubID 2050157
ParticipantIDs proquest_journals_2973279860
PublicationCentury 2000
PublicationDate 20240321
PublicationDateYYYYMMDD 2024-03-21
PublicationDate_xml – month: 03
  year: 2024
  text: 20240321
  day: 21
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2024
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 3.5307057
SecondaryResourceType preprint
Snippet Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Bias
Cognition
Datasets
Gender
Human bias
Large language models
Modules
Title Locating and Mitigating Gender Bias in Large Language Models
URI https://www.proquest.com/docview/2973279860
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bS8MwFD7oiuCbV7zMEdDXYJdLk4IgTFqHrGOIwt5G06VSkO7S-epv96Tt9EHYS-AQSEgI58v58pEP4E7xLPdz31AjtKTCcE51JkOaKz-by1RYVv_TnYyD4bt4mcppS7hVraxymxPrRD1fZI4jv3ceS0yFOvAflyvqXKPc62probEPXp8p5YovHT__ciwsUHhj5v_SbI0d8RF4k3Rp18ewZ8sTOKgll1l1Cg-jhaPLyg-CxTxJiuazCwwbdzcyKNKKFCUZOa02tg2vSJx52Wd1Brdx9PY0pNspZ-2hqGZ_S-Dn0MHq3l4A0RohNBcWodsgmPtaiZSFWlkppeFMX0J310hXu7uv4ZAhCjvRFOt3obNZf9kbRNGN6dVb1QNvEI0nrxgl39EPk9B64Q
link.rule.ids 783,787,12777,21400,33385,33756,43612,43817
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1dS8MwFL3ohuibn_gxNaCvwZqPJgVBUKxV2-HDhL2VpkulMLq5zv_vTdvpg7CXQAgkJIR7ck9OcgCuFc8Lr_AMNUJLKgznVOcyoIXy8onMhGXNP93J0I8-xOtYjjvCre5klauY2ATqySx3HPmN81hiKtC-dz__os41yt2udhYam9AXHLHavRQPn385FuYrPDHzf2G2wY5wF_rv2dwu9mDDVvuw1Ugu8_oA7uKZo8uqT4LJPEnK9rMLrLbubuShzGpSViR2Wm0sW16ROPOyaX0IV-HT6DGiqyHTblPU6d8U-BH0MLu3x0C0RggthEXoNgjmnlYiY4FWVkppONMnMFjX0-n65kvYjkZJnMYvw7cz2GGIyE5AxW4H0Fsuvu05IurSXDTL9gOp3nr4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Locating+and+Mitigating+Gender+Bias+in+Large+Language+Models&rft.jtitle=arXiv.org&rft.au=Cai%2C+Yuchen&rft.au=Cao%2C+Ding&rft.au=Guo%2C+Rongxi&rft.au=Wen%2C+Yaqin&rft.date=2024-03-21&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422