Toward Universal Detection of Adversarial Examples via Pseudorandom Classifiers

Adversarial examples that can fool neural network classifiers have attracted much attention. Existing approaches to detect adversarial examples leverage a supervised scheme in generating attacks (either targeted or non-targeted) for training the detectors, which means the detectors are geared to the...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information forensics and security Vol. 19; pp. 1810 - 1825
Main Authors Zhu, Boyu, Dong, Changyu, Zhang, Yuan, Mao, Yunlong, Zhong, Sheng
Format Journal Article
LanguageEnglish
Published New York IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Adversarial examples that can fool neural network classifiers have attracted much attention. Existing approaches to detect adversarial examples leverage a supervised scheme in generating attacks (either targeted or non-targeted) for training the detectors, which means the detectors are geared to the attacks chosen at the training time and could be circumvented if the adversary does not act as expected. In this paper, we borrow ideas from cryptography and present a novel approach called pseudorandom classifier. In a nutshell, a pseudorandom classifier is a classifier equipped with a mapping to encode the category labels into random multi-bit labels, and a keyed pseudorandom injective function to transform the input to the classifier. The multi-bit labels enable attack-independent and probabilistic detection if the input sample is adversarial. The pseudorandom injection makes the existing white-box adversarial example generation methods, largely based on back-propagation, no longer applicable. We empirically evaluate our method on MNIST, CIFAR10, Imagenette, CIFAR100, and GTSRB. The results suggest that its performance against adversarial examples is comparable to the state-of-the-art.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2023.3340889