Toward Universal Detection of Adversarial Examples via Pseudorandom Classifiers

Adversarial examples that can fool neural network classifiers have attracted much attention. Existing approaches to detect adversarial examples leverage a supervised scheme in generating attacks (either targeted or non-targeted) for training the detectors, which means the detectors are geared to the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information forensics and security Vol. 19; pp. 1810 - 1825
Main Authors	Zhu, Boyu, Dong, Changyu, Zhang, Yuan, Mao, Yunlong, Zhong, Sheng
Format	Journal Article
Language	English
Published	New York IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adversarial examples attack-independent detection Back propagation networks Classifiers Cryptography Detectors Encoding Labels Neural networks Perturbation methods Probabilistic logic Pseudorandom pseudorandom classifier Robustness Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Adversarial examples that can fool neural network classifiers have attracted much attention. Existing approaches to detect adversarial examples leverage a supervised scheme in generating attacks (either targeted or non-targeted) for training the detectors, which means the detectors are geared to the attacks chosen at the training time and could be circumvented if the adversary does not act as expected. In this paper, we borrow ideas from cryptography and present a novel approach called pseudorandom classifier. In a nutshell, a pseudorandom classifier is a classifier equipped with a mapping to encode the category labels into random multi-bit labels, and a keyed pseudorandom injective function to transform the input to the classifier. The multi-bit labels enable attack-independent and probabilistic detection if the input sample is adversarial. The pseudorandom injection makes the existing white-box adversarial example generation methods, largely based on back-propagation, no longer applicable. We empirically evaluate our method on MNIST, CIFAR10, Imagenette, CIFAR100, and GTSRB. The results suggest that its performance against adversarial examples is comparable to the state-of-the-art.
ISSN:	1556-6013 1556-6021
DOI:	10.1109/TIFS.2023.3340889