Detecting computer-generated disinformation

Modern neural language models can be used by malicious actors to automatically produce textual content looking as it has been written by genuine human users. Due to progress in the controllability of computer-generated text, there is a risk that state-sponsored actors may start using such methods fo...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of data science and analytics Vol. 13; no. 4; pp. 363 - 383
Main Authors Stiff, Harald, Johansson, Fredrik
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Modern neural language models can be used by malicious actors to automatically produce textual content looking as it has been written by genuine human users. Due to progress in the controllability of computer-generated text, there is a risk that state-sponsored actors may start using such methods for conducting large-scale information operations. Various detection algorithms have been suggested in the research literature to identify texts produced by language model-based generators, but these are often mainly evaluated on test data from the same distribution as they have been trained on. We evaluate promising Transformer-based detection algorithms in a large variety of experiments involving both in-distribution and out-of-distribution test data, as well as evaluation on more realistic in-the-wild data. It is shown that the generalizability of the detectors can be questioned, especially when applied to short social media posts. Moreover, the best performing (RoBERTa-based) detector is shown to be non-robust also to basic adversarial attacks, illustrating how easy it is for malicious actors to avoid detection by the current state-of-the-art detection algorithms.
ISSN:2364-415X
2364-4168
DOI:10.1007/s41060-021-00299-5