Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker

Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important di...

Full description

Saved in:

Bibliographic Details
Main Authors	Dai, Sihui, Ding, Wenxin, Bhagoji, Arjun Nitin, Cullina, Daniel, Zhao, Ben Y, Zheng, Haitao, Mittal, Prateek
Format	Journal Article
Language	English
Published	21.02.2023
Subjects	Computer Science - Cryptography and Security Computer Science - Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints. We further define other variants of the attacker-classifier game that determine the range of the optimal loss more efficiently than the full-fledged hypergraph construction. Our evaluation shows, for the first time, an analysis of the gap to optimal robustness for classifiers in the multi-class setting on benchmark datasets.
DOI:	10.48550/arxiv.2302.10722