MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms
This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with curr...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study investigates the computational processing of euphemisms, a
universal linguistic phenomenon, across multiple languages. We train a
multilingual transformer model (XLM-RoBERTa) to disambiguate potentially
euphemistic terms (PETs) in multilingual and cross-lingual settings. In line
with current trends, we demonstrate that zero-shot learning across languages
takes place. We also show cases where multilingual models perform better on the
task compared to monolingual models by a statistically significant margin,
indicating that multilingual data presents additional opportunities for models
to learn about cross-lingual, computational properties of euphemisms. In a
follow-up analysis, we focus on universal euphemistic "categories" such as
death and bodily functions among others. We test to see whether cross-lingual
data of the same domain is more important than within-language data of other
domains to further understand the nature of the cross-lingual transfer. |
---|---|
DOI: | 10.48550/arxiv.2401.14526 |