Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer
Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, tran...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.12.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have
been remarkably successful in enabling natural language tasks in low-resource
languages through cross-lingual transfer from high-resource ones. In this work,
we try to better understand how such models, specifically mT5, transfer *any*
linguistic and semantic knowledge across languages, even though no explicit
cross-lingual signals are provided during pre-training. Rather, only
unannotated texts from each language are presented to the model separately and
independently of one another, and the model appears to implicitly learn
cross-lingual connections. This raises several questions that motivate our
study, such as: Are the cross-lingual connections between every language pair
equally strong? What properties of source and target language impact the
strength of cross-lingual transfer? Can we quantify the impact of those
properties on the cross-lingual transfer?
In our investigation, we analyze a pre-trained mT5 to discover the attributes
of cross-lingual connections learned by the model. Through a statistical
interpretation framework over 90 language pairs across three tasks, we show
that transfer performance can be modeled by a few linguistic and data-derived
features. These observations enable us to interpret cross-lingual understanding
of the mT5 model. Through these observations, one can favorably choose the best
source language for a task, and can anticipate its training data demands. A key
finding of this work is that similarity of syntax, morphology and phonology are
good predictors of cross-lingual transfer, significantly more than just the
lexical similarity of languages. For a given language, we are able to predict
zero-shot performance, that increases on a logarithmic scale with the number of
few-shot target language data points. |
---|---|
DOI: | 10.48550/arxiv.2212.01757 |