Detecting computer-generated disinformation

Modern neural language models can be used by malicious actors to automatically produce textual content looking as it has been written by genuine human users. Due to progress in the controllability of computer-generated text, there is a risk that state-sponsored actors may start using such methods fo...

Full description

Saved in:

Bibliographic Details
Published in	International journal of data science and analytics Vol. 13; no. 4; pp. 363 - 383
Main Authors	Stiff, Harald, Johansson, Fredrik
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.05.2022
Subjects	Artificial Intelligence Business Information Systems Computational Biology/Bioinformatics Computer Science Data Mining and Knowledge Discovery Database Management Regular Paper Computer-generated text Information operations Detection algorithms Language models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Modern neural language models can be used by malicious actors to automatically produce textual content looking as it has been written by genuine human users. Due to progress in the controllability of computer-generated text, there is a risk that state-sponsored actors may start using such methods for conducting large-scale information operations. Various detection algorithms have been suggested in the research literature to identify texts produced by language model-based generators, but these are often mainly evaluated on test data from the same distribution as they have been trained on. We evaluate promising Transformer-based detection algorithms in a large variety of experiments involving both in-distribution and out-of-distribution test data, as well as evaluation on more realistic in-the-wild data. It is shown that the generalizability of the detectors can be questioned, especially when applied to short social media posts. Moreover, the best performing (RoBERTa-based) detector is shown to be non-robust also to basic adversarial attacks, illustrating how easy it is for malicious actors to avoid detection by the current state-of-the-art detection algorithms.
ISSN:	2364-415X 2364-4168
DOI:	10.1007/s41060-021-00299-5