ClimaText: A Dataset for Climate Change Topic Detection
Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.12.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Climate change communication in the mass media and other textual sources may
affect and shape public perception. Extracting climate change information from
these sources is an important task, e.g., for filtering content and
e-discovery, sentiment analysis, automatic summarization, question-answering,
and fact-checking. However, automating this process is a challenge, as climate
change is a complex, fast-moving, and often ambiguous topic with scarce
resources for popular text-based AI tasks. In this paper, we introduce
\textsc{ClimaText}, a dataset for sentence-based climate change topic
detection, which we make publicly available. We explore different approaches to
identify the climate change topic in various text sources. We find that popular
keyword-based models are not adequate for such a complex and evolving task.
Context-based algorithms like BERT \cite{devlin2018bert} can detect, in
addition to many trivial cases, a variety of complex and implicit topic
patterns. Nevertheless, our analysis reveals a great potential for improvement
in several directions, such as, e.g., capturing the discussion on indirect
effects of climate change. Hence, we hope this work can serve as a good
starting point for further research on this topic. |
---|---|
DOI: | 10.48550/arxiv.2012.00483 |