Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker
Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important di...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
21.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Finding classifiers robust to adversarial examples is critical for their safe
deployment. Determining the robustness of the best possible classifier under a
given threat model for a given data distribution and comparing it to that
achieved by state-of-the-art training methods is thus an important diagnostic
tool. In this paper, we find achievable information-theoretic lower bounds on
loss in the presence of a test-time attacker for multi-class classifiers on any
discrete dataset. We provide a general framework for finding the optimal 0-1
loss that revolves around the construction of a conflict hypergraph from the
data and adversarial constraints. We further define other variants of the
attacker-classifier game that determine the range of the optimal loss more
efficiently than the full-fledged hypergraph construction. Our evaluation
shows, for the first time, an analysis of the gap to optimal robustness for
classifiers in the multi-class setting on benchmark datasets. |
---|---|
DOI: | 10.48550/arxiv.2302.10722 |