Locating and Mitigating Gender Bias in Large Language Models
Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bi...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
21.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects. |
---|---|
AbstractList | Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects. |
Author | Cai, Yuchen Liu, Guiquan Cao, Ding Chen, Enhong Wen, Yaqin Guo, Rongxi |
Author_xml | – sequence: 1 givenname: Yuchen surname: Cai fullname: Cai, Yuchen – sequence: 2 givenname: Ding surname: Cao fullname: Cao, Ding – sequence: 3 givenname: Rongxi surname: Guo fullname: Guo, Rongxi – sequence: 4 givenname: Yaqin surname: Wen fullname: Wen, Yaqin – sequence: 5 givenname: Guiquan surname: Liu fullname: Liu, Guiquan – sequence: 6 givenname: Enhong surname: Chen fullname: Chen, Enhong |
BookMark | eNqNissKwjAURIMoWLX_EHBdiDf2Ba4UH4t2574Eew0p5Ubz-H8L-gFuZs5wZsXmZAlnLAEpd1m1B1iy1PtBCAFFCXkuE3Zo7EMFQ5or6nlrgtHfeUXq0fGjUZ4b4o1yGqckHdUEre1x9Bu2eKrRY_rrNdtezvfTLXs5-47oQzfY6GhSHdSlhLKuCiH_e30AFEY4eA |
ContentType | Paper |
Copyright | 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection ProQuest Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_29732798603 |
IEDL.DBID | 8FG |
IngestDate | Thu Oct 10 17:40:59 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_29732798603 |
OpenAccessLink | https://www.proquest.com/docview/2973279860?pq-origsite=%requestingapplication% |
PQID | 2973279860 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2973279860 |
PublicationCentury | 2000 |
PublicationDate | 20240321 |
PublicationDateYYYYMMDD | 2024-03-21 |
PublicationDate_xml | – month: 03 year: 2024 text: 20240321 day: 21 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2024 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.5307057 |
SecondaryResourceType | preprint |
Snippet | Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Bias Cognition Datasets Gender Human bias Large language models Modules |
Title | Locating and Mitigating Gender Bias in Large Language Models |
URI | https://www.proquest.com/docview/2973279860 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bS8MwFD7oiuCbV7zMEdDXYJdLk4IgTFqHrGOIwt5G06VSkO7S-epv96Tt9EHYS-AQSEgI58v58pEP4E7xLPdz31AjtKTCcE51JkOaKz-by1RYVv_TnYyD4bt4mcppS7hVraxymxPrRD1fZI4jv3ceS0yFOvAflyvqXKPc62probEPXp8p5YovHT__ciwsUHhj5v_SbI0d8RF4k3Rp18ewZ8sTOKgll1l1Cg-jhaPLyg-CxTxJiuazCwwbdzcyKNKKFCUZOa02tg2vSJx52Wd1Brdx9PY0pNspZ-2hqGZ_S-Dn0MHq3l4A0RohNBcWodsgmPtaiZSFWlkppeFMX0J310hXu7uv4ZAhCjvRFOt3obNZf9kbRNGN6dVb1QNvEI0nrxgl39EPk9B64Q |
link.rule.ids | 783,787,12777,21400,33385,33756,43612,43817 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1dS8MwFL3ohuibn_gxNaCvwZqPJgVBUKxV2-HDhL2VpkulMLq5zv_vTdvpg7CXQAgkJIR7ck9OcgCuFc8Lr_AMNUJLKgznVOcyoIXy8onMhGXNP93J0I8-xOtYjjvCre5klauY2ATqySx3HPmN81hiKtC-dz__os41yt2udhYam9AXHLHavRQPn385FuYrPDHzf2G2wY5wF_rv2dwu9mDDVvuw1Ugu8_oA7uKZo8uqT4LJPEnK9rMLrLbubuShzGpSViR2Wm0sW16ROPOyaX0IV-HT6DGiqyHTblPU6d8U-BH0MLu3x0C0RggthEXoNgjmnlYiY4FWVkppONMnMFjX0-n65kvYjkZJnMYvw7cz2GGIyE5AxW4H0Fsuvu05IurSXDTL9gOp3nr4 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Locating+and+Mitigating+Gender+Bias+in+Large+Language+Models&rft.jtitle=arXiv.org&rft.au=Cai%2C+Yuchen&rft.au=Cao%2C+Ding&rft.au=Guo%2C+Rongxi&rft.au=Wen%2C+Yaqin&rft.date=2024-03-21&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |