Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language v...

Full description

Saved in:
Bibliographic Details
Main Authors Gruppi, Maurício, Adalı, Sibel, Chen, Pin-Yu
Format Journal Article
LanguageEnglish
Published 30.01.2021
Subjects
Online AccessGet full text
DOI10.48550/arxiv.2102.00290

Cover

Loading…
Abstract The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.
AbstractList The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.
Author Adalı, Sibel
Chen, Pin-Yu
Gruppi, Maurício
Author_xml – sequence: 1
  givenname: Maurício
  surname: Gruppi
  fullname: Gruppi, Maurício
– sequence: 2
  givenname: Sibel
  surname: Adalı
  fullname: Adalı, Sibel
– sequence: 3
  givenname: Pin-Yu
  surname: Chen
  fullname: Chen, Pin-Yu
BackLink https://doi.org/10.48550/arXiv.2102.00290$$DView paper in arXiv
BookMark eNqFjrsOgjAYRjvo4O0BnPxfACwoiboaiAsTJMZFUm2rfygtaYHo24uX3enLd3KGMyYDbbQgZB5Qf72JIrpk9oGdHwY09CkNt3REzgkrBWADOSoFJ9NC-gU7yISSXtbWwnboBO9_xXSDV8juKBsH0lhIjTYK9a1lCo7Gcoiri-C8J5AzV7opGUqmnJj9dkIWSZzvD96npKgtVsw-i3dR8Sla_TdejINCTg
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2102.00290
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2102_00290
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2102_002903
IEDL.DBID GOX
IngestDate Tue Jul 22 23:40:18 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2102_002903
OpenAccessLink https://arxiv.org/abs/2102.00290
ParticipantIDs arxiv_primary_2102_00290
PublicationCentury 2000
PublicationDate 2021-01-30
PublicationDateYYYYMMDD 2021-01-30
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-30
  day: 30
PublicationDecade 2020
PublicationYear 2021
Score 3.5004058
SecondaryResourceType preprint
Snippet The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Learning
Title Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks
URI https://arxiv.org/abs/2102.00290
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbGTlwQCNB4-8C1AvoIlBtCKxPS4NAieqJKmhSqdWhaWsTPx06H4LJjHMtyElmfnTg2wLmSBFJS-J6Kw9ILIx14SnAPjUAH3AJJXrncnOmTmLyEj3mUDwB__8LI5Xf91dcHVvaC4xG-8ogpKN_wfU7ZenjO-8dJV4prxf_HRz6mI_0DiWQbtlbeHd71x7EDA_O5C2-JnBmsW8zqpkEyL5z2hFtMTVN5abdgg7VG03hOK61LTD_qqrVIHiWS1XFjnfeOBL9SqIjjuTKaIQczaWd2D86ScXY_8ZxGxaIvH1GwsoVTNtiHIQX5ZgQYVeS4CzIWda1DYaKbQOhKl1yNzxCGxgcwWiflcP3UEWz6nINxyXdMxzBsl505IRBt1anbyR9oN3UA
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fake+it+Till+You+Make+it%3A+Self-Supervised+Semantic+Shifts+for+Monolingual+Word+Embedding+Tasks&rft.au=Gruppi%2C+Maur%C3%ADcio&rft.au=Adal%C4%B1%2C+Sibel&rft.au=Chen%2C+Pin-Yu&rft.date=2021-01-30&rft_id=info:doi/10.48550%2Farxiv.2102.00290&rft.externalDocID=2102_00290