XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages
Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. Hence, fact-to-text generation systems have been developed for various downstream tasks like generating soccer reports, weather and financial reports, medical reports, person bi...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
22.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Multiple business scenarios require an automated generation of descriptive
human-readable text from structured input data. Hence, fact-to-text generation
systems have been developed for various downstream tasks like generating soccer
reports, weather and financial reports, medical reports, person biographies,
etc. Unfortunately, previous work on fact-to-text (F2T) generation has focused
primarily on English mainly due to the high availability of relevant datasets.
Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed
for generation across multiple languages alongwith a dataset, XALIGN for eight
languages. However, there has been no rigorous work on the actual XF2T
generation problem. We extend XALIGN dataset with annotated data for four more
languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive
study using popular Transformer-based text generation models on our extended
multi-lingual dataset, which we call XALIGNV2. Further, we investigate the
performance of different text generation strategies: multiple variations of
pretraining, fact-aware embeddings and structure-aware input encoding. Our
extensive experiments show that a multi-lingual mT5 model which uses fact-aware
embeddings with structure-aware input encoding leads to best results on average
across the twelve languages. We make our code, dataset and model publicly
available, and hope that this will help advance further research in this
critical area. |
---|---|
DOI: | 10.48550/arxiv.2209.11252 |