Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language v...

Full description

Saved in:

Bibliographic Details
Main Authors	Gruppi, Maurício, Adalı, Sibel, Chen, Pin-Yu
Format	Journal Article
Language	English
Published	30.01.2021
Subjects	Computer Science - Computation and Language Computer Science - Learning
Online Access	Get full text
DOI	10.48550/arxiv.2102.00290

Cover

Loading…

Abstract	The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.
AbstractList	The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.
Author	Adalı, Sibel Chen, Pin-Yu Gruppi, Maurício
Author_xml	– sequence: 1 givenname: Maurício surname: Gruppi fullname: Gruppi, Maurício – sequence: 2 givenname: Sibel surname: Adalı fullname: Adalı, Sibel – sequence: 3 givenname: Pin-Yu surname: Chen fullname: Chen, Pin-Yu
BackLink	https://doi.org/10.48550/arXiv.2102.00290$$DView paper in arXiv
BookMark	eNqFjrsOgjAYRjvo4O0BnPxfACwoiboaiAsTJMZFUm2rfygtaYHo24uX3enLd3KGMyYDbbQgZB5Qf72JIrpk9oGdHwY09CkNt3REzgkrBWADOSoFJ9NC-gU7yISSXtbWwnboBO9_xXSDV8juKBsH0lhIjTYK9a1lCo7Gcoiri-C8J5AzV7opGUqmnJj9dkIWSZzvD96npKgtVsw-i3dR8Sla_TdejINCTg
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2102.00290
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2102_00290
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2102_002903
IEDL.DBID	GOX
IngestDate	Tue Jul 22 23:40:18 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2102_002903
OpenAccessLink	https://arxiv.org/abs/2102.00290
ParticipantIDs	arxiv_primary_2102_00290
PublicationCentury	2000
PublicationDate	2021-01-30
PublicationDateYYYYMMDD	2021-01-30
PublicationDate_xml	– month: 01 year: 2021 text: 2021-01-30 day: 30
PublicationDecade	2020
PublicationYear	2021
Score	3.5004058
SecondaryResourceType	preprint
Snippet	The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language Computer Science - Learning
Title	Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks
URI	https://arxiv.org/abs/2102.00290
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbGTlwQCNB4-8C1AvoIlBtCKxPS4NAieqJKmhSqdWhaWsTPx06H4LJjHMtyElmfnTg2wLmSBFJS-J6Kw9ILIx14SnAPjUAH3AJJXrncnOmTmLyEj3mUDwB__8LI5Xf91dcHVvaC4xG-8ogpKN_wfU7ZenjO-8dJV4prxf_HRz6mI_0DiWQbtlbeHd71x7EDA_O5C2-JnBmsW8zqpkEyL5z2hFtMTVN5abdgg7VG03hOK61LTD_qqrVIHiWS1XFjnfeOBL9SqIjjuTKaIQczaWd2D86ScXY_8ZxGxaIvH1GwsoVTNtiHIQX5ZgQYVeS4CzIWda1DYaKbQOhKl1yNzxCGxgcwWiflcP3UEWz6nINxyXdMxzBsl505IRBt1anbyR9oN3UA
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fake+it+Till+You+Make+it%3A+Self-Supervised+Semantic+Shifts+for+Monolingual+Word+Embedding+Tasks&rft.au=Gruppi%2C+Maur%C3%ADcio&rft.au=Adal%C4%B1%2C+Sibel&rft.au=Chen%2C+Pin-Yu&rft.date=2021-01-30&rft_id=info:doi/10.48550%2Farxiv.2102.00290&rft.externalDocID=2102_00290