URET: Universal Robustness Evaluation Toolkit (for Evasion)
Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Machine learning models are known to be vulnerable to adversarial evasion
attacks as illustrated by image classification models. Thoroughly understanding
such attacks is critical in order to ensure the safety and robustness of
critical AI tasks. However, most evasion attacks are difficult to deploy
against a majority of AI systems because they have focused on image domain with
only few constraints. An image is composed of homogeneous, numerical,
continuous, and independent features, unlike many other input types to AI
systems used in practice. Furthermore, some input types include additional
semantic and functional constraints that must be observed to generate realistic
adversarial inputs. In this work, we propose a new framework to enable the
generation of adversarial inputs irrespective of the input type and task
domain. Given an input and a set of pre-defined input transformations, our
framework discovers a sequence of transformations that result in a semantically
correct and functional adversarial input. We demonstrate the generality of our
approach on several diverse machine learning tasks with various input
representations. We also show the importance of generating adversarial examples
as they enable the deployment of mitigation techniques. |
---|---|
DOI: | 10.48550/arxiv.2308.01840 |