Unifying Structured Data as Graph for Data-to-Text Pre-Training
Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performance. However, previous pre-training methods either oversimplified structured data into a sequence...
Saved in:
Published in | Transactions of the Association for Computational Linguistics Vol. 12; pp. 210 - 228 |
---|---|
Main Authors | , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA
MIT Press
08.03.2024
The MIT Press |
Online Access | Get full text |
Cover
Loading…
Summary: | Data-to-text (D2T) generation aims to transform structured data into natural
language text. Data-to-text pre-training has proved to be powerful in enhancing
D2T generation and yields impressive performance. However, previous pre-training
methods either oversimplified structured data into a sequence without
considering input structures or designed training objectives tailored for a
specific data structure (e.g., table or knowledge graph). In this paper, we
unify different types of structured data (i.e., table, key-value data, knowledge
graph) into the graph format and cast different D2T generation tasks as
graph-to-text generation. To effectively exploit the structural information of
the input graph, we propose a structure-enhanced pre-training method for D2T
generation by designing a structure-enhanced Transformer. Concretely, we devise
a position matrix for the Transformer, encoding relative positional information
of connected nodes in the input graph. In addition, we propose a new attention
matrix to incorporate graph structures into the original Transformer by taking
the available explicit connectivity structure into account. Extensive
experiments on six benchmark datasets show the effectiveness of our model. Our
source codes are available at
. |
---|---|
Bibliography: | 2024 |
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00641 |