CCPM: A Chinese Classical Poetry Matching Dataset
Poetry is one of the most important art forms of human languages. Recently many studies have focused on incorporating some linguistic features of poetry, such as style and sentiment, into its understanding or generation system. However, there is no focus on understanding or evaluating the semantics...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.06.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Poetry is one of the most important art forms of human languages. Recently
many studies have focused on incorporating some linguistic features of poetry,
such as style and sentiment, into its understanding or generation system.
However, there is no focus on understanding or evaluating the semantics of
poetry. Therefore, we propose a novel task to assess a model's semantic
understanding of poetry by poem matching. Specifically, this task requires the
model to select one line of Chinese classical poetry among four candidates
according to the modern Chinese translation of a line of poetry. To construct
this dataset, we first obtain a set of parallel data of Chinese classical
poetry and modern Chinese translation. Then we retrieve similar lines of poetry
with the lines in a poetry corpus as negative choices. We name the dataset
Chinese Classical Poetry Matching Dataset (CCPM) and release it at
https://github.com/THUNLP-AIPoet/CCPM. We hope this dataset can further enhance
the study on incorporating deep semantics into the understanding and generation
system of Chinese classical poetry. We also preliminarily run two variants of
BERT on this dataset as the baselines for this dataset. |
---|---|
DOI: | 10.48550/arxiv.2106.01979 |