Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction
Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of "train on...
Saved in:
Main Authors | , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Zero-shot cross-lingual information extraction (IE) describes the
construction of an IE model for some target language, given existing
annotations exclusively in some other language, typically English. While the
advance of pretrained multilingual encoders suggests an easy optimism of "train
on English, run on any language", we find through a thorough exploration and
extension of techniques that a combination of approaches, both new and old,
leads to better performance than any one cross-lingual strategy in particular.
We explore techniques including data projection and self-training, and how
different pretrained encoders impact them. We use English-to-Arabic IE as our
initial example, demonstrating strong performance in this setting for event
extraction, named entity recognition, part-of-speech tagging, and dependency
parsing. We then apply data projection and self-training to three tasks across
eight target languages. Because no single set of techniques performs the best
across all tasks, we encourage practitioners to explore various configurations
of the techniques described in this work when seeking to improve on zero-shot
training. |
---|---|
DOI: | 10.48550/arxiv.2109.06798 |