Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias
Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing o...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
27.10.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current emphasis on large-scale, multilingual language models. |
---|---|
AbstractList | Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current emphasis on large-scale, multilingual language models. |
Author | Gatt, Albert Bartl, Marion Nissim, Malvina |
Author_xml | – sequence: 1 givenname: Marion surname: Bartl fullname: Bartl, Marion – sequence: 2 givenname: Malvina surname: Nissim fullname: Nissim, Malvina – sequence: 3 givenname: Albert surname: Gatt fullname: Gatt, Albert |
BookMark | eNqNis0KgkAURocoyMp3GGjRSrAZx36WitXGTdlaBrzJmM3Y3BHq7TPoAeIsPg7fmZGxNhpGxGOcr4NtxNiU-IhNGIYs3jAhuEfyq35IvCtd09RoBy_Xy5ZeHFgw7t0B7mkOEnv7LaSuaK6cqqX7apKdixXSI-gKLE2UxAWZ3GSL4P92TpaHrEhPQWfNswd0ZWN6q4erZJEQcbQb4P9VH92IPzM |
ContentType | Paper |
Copyright | 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_24556494943 |
IEDL.DBID | 8FG |
IngestDate | Thu Oct 10 16:43:23 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_24556494943 |
OpenAccessLink | https://www.proquest.com/docview/2455649494?pq-origsite=%requestingapplication% |
PQID | 2455649494 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2455649494 |
PublicationCentury | 2000 |
PublicationDate | 20201027 |
PublicationDateYYYYMMDD | 2020-10-27 |
PublicationDate_xml | – month: 10 year: 2020 text: 20201027 day: 27 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2020 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.302395 |
SecondaryResourceType | preprint |
Snippet | Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Bias Gender Human bias Languages Measurement methods Morphology Words (language) |
Title | Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias |
URI | https://www.proquest.com/docview/2455649494 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB60i-CtvlBbS0DBU5Bms0nXi7CyaxG2lNpCbyWbzUIPtrWp4Mnfbibd6kHoMSSETAjfzHyZB8CdwCZF3dhQbVhMeVhpqpxZS3WPaSW0kGXhA2QHoj_hr9NoWhNutg6r3GGiB-pyqZEjf2A8igSWUuFPqw-KXaPwd7VuoXEIQRcr4WGmePbyy7EwIZ3FHP6DWa87siYEQ7Uy6xM4MItTOPIhl9qeQT5ZvCuLTDXxJaK-MJODvDk5zRKJUftIck_g4Qrn75N8vq2H4YZJOhrfW7LtA0eSubLncJul4-c-3Z1hVr8SO_uTKbyAhnP3zSUQzgsZRxKTcDg3zgLrhZVRRaVVVQpt5BW09-10vX-6BccMPUaHvky2obFZf5obp1Y3RcffXQeCJB0MR26Uf6c_tGWEFw |
link.rule.ids | 786,790,12792,21416,33408,33779,43635,43840 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFH9oi-jNT_yYGlDwFMQ0TVYvQqWj6lrG7GC3kqYp7OA2lwn--SZZpwdhx5AQ8kJ4H7-8934At8ySFD1ECktFIkyDRmJh3Fosu0QKJhmvK5cgm7N0RF_H4bgF3HSbVrnWiU5R1zNpMfJ7QsOQ2VYq9Gn-iS1rlP1dbSk0tsGngQlVPPDjJB8Mf1EWwrjxmYN_itZZj94--AMxV4sD2FLTQ9hxSZdSH0E2mn4IbbFq5JpEfdtaDvRuJFUzC43qR5Q5CM-uMBE_yiarjhhmGCfD4k6jFRMciidCH8NNLymeU7w-Q9m-E13-SRWcgGcCfnUKiNKKRyG3ZTiUKuODdYNGiaqRoqmZVPwMOpt2Ot88fQ27aZH1y_5L_nYBe8TGj0YXE94Bb7n4UpfGyC6rq_YmfwD9SIWc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unmasking+Contextual+Stereotypes%3A+Measuring+and+Mitigating+BERT%27s+Gender+Bias&rft.jtitle=arXiv.org&rft.au=Bartl%2C+Marion&rft.au=Nissim%2C+Malvina&rft.au=Gatt%2C+Albert&rft.date=2020-10-27&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |