Argument discovery via crowdsourcing

The amount of controversial issues being discussed on the Web has been growing dramatically. In articles, blogs, and wikis, people express their points of view in the form of arguments, i.e., claims that are supported by evidence. Discovery of arguments has a large potential for informing decision-m...

Full description

Saved in:
Bibliographic Details
Published inThe VLDB journal Vol. 26; no. 4; pp. 511 - 535
Main Authors Nguyen, Quoc Viet Hung, Duong, Chi Thang, Nguyen, Thanh Tam, Weidlich, Matthias, Aberer, Karl, Yin, Hongzhi, Zhou, Xiaofang
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2017
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The amount of controversial issues being discussed on the Web has been growing dramatically. In articles, blogs, and wikis, people express their points of view in the form of arguments, i.e., claims that are supported by evidence. Discovery of arguments has a large potential for informing decision-making. However, argument discovery is hindered by the sheer amount of available Web data and its unstructured, free-text representation. The former calls for automatic text-mining approaches, whereas the latter implies a need for manual processing to extract the structure of arguments. In this paper, we propose a crowdsourcing-based approach to build a corpus of arguments, an argumentation base , thereby mediating the trade-off of automatic text-mining and manual processing in argument discovery. We develop an end-to-end process that minimizes the crowd cost while maximizing the quality of crowd answers by: (1) ranking argumentative texts, (2) pro-actively eliciting user input to extract arguments from these texts, and (3) aggregating heterogeneous crowd answers. Our experiments with real-world datasets highlight that our method discovers virtually all arguments in documents when processing only 25% of the text with more than 80% precision, using only 50% of the budget consumed by a baseline algorithm.
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-017-0462-9