Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. Ho...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 24635 - 24644
Main Authors Sato, Takami, Yue, Justin, Chen, Nanze, Wang, Ningfei, Chen, Qi Alfred
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: we identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.
AbstractList Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: we identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.
Author Sato, Takami
Chen, Nanze
Yue, Justin
Wang, Ningfei
Chen, Qi Alfred
Author_xml – sequence: 1
  givenname: Takami
  surname: Sato
  fullname: Sato, Takami
  email: takamis@uci.edu
  organization: University of California,Irvine
– sequence: 2
  givenname: Justin
  surname: Yue
  fullname: Yue, Justin
  email: jpyue@uci.edu
  organization: University of California,Irvine
– sequence: 3
  givenname: Nanze
  surname: Chen
  fullname: Chen, Nanze
  email: ningfei.wang@uci.edu
  organization: University of Cambridge
– sequence: 4
  givenname: Ningfei
  surname: Wang
  fullname: Wang, Ningfei
  email: alfchen@uci.edu
  organization: University of California,Irvine
– sequence: 5
  givenname: Qi Alfred
  surname: Chen
  fullname: Chen, Qi Alfred
  email: nc630@cam.ac.uk
  organization: University of California,Irvine
BookMark eNotkN9OwjAcRqvRRETegIu-wLB_tnb1jkxEElSi6C1pu9-wOrql64xc-uZi5OpLTk7OxXeJznzjAaExJRNKibou3lbPGZOcTxhh6YQwzsQJGimpcp4RnnFCxCkaUCJ4IhRVF2jUdR-EEM4oFSofoJ-Fj8Fte-e3eBWaFkJ00OGmwreuqvrONR4_NCXU3Q2eejzbtS44q2v8Evty_-fFd8CPOvbhAKcxavuJC91q42oX99h5vIbvmMQmWez0FvAcPAQd3Rccu1fovNJ1B6PjDtHr3Wxd3CfLp_mimC4TR6WIiSl5LqmxlmklDCHMpoZSLlNuS22zTJaVhdQyYUCUTBkmKw6g84waZpTO-RCN_7sOADZtcDsd9pvDNVmaC8l_ARCiZN4
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52733.2024.02326
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350353006
EISSN 1063-6919
EndPage 24644
ExternalDocumentID 10654867
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: CNS-2145493,CNS-1929771,CNS-1932464
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i176t-bd3871bcc2a96b002c4b113743cdac557dfce4c26be6d29b27f3eea851b2b9a83
IEDL.DBID RIE
IngestDate Wed Aug 27 02:00:48 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-bd3871bcc2a96b002c4b113743cdac557dfce4c26be6d29b27f3eea851b2b9a83
PageCount 10
ParticipantIDs ieee_primary_10654867
PublicationCentury 2000
PublicationDate 2024-June-16
PublicationDateYYYYMMDD 2024-06-16
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-16
  day: 16
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.2930274
Snippet Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the...
SourceID ieee
SourceType Publisher
StartPage 24635
SubjectTerms Adversarial Attack
Artificial neural networks
Autonomous Driving
Diffusion Model
Diffusion models
Feature extraction
Noise reduction
Predictive models
Safety
Security
Text to image
Visual systems
Title Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
URI https://ieeexplore.ieee.org/document/10654867
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1La8JAEF6qp57sw9I3e-g1qdldd5PexCpaqEjR4k2yj4BYo9TkYG_9552J0ZZCobcQhiRsMvPNbOb7hpA73uTMRsZ4kIw6T4A3QRwUyms0DOAR16HhSHB-HsjeWDxNmpOSrF5wYZxzRfOZ8_Gw-JdvlybHrTLwcJxyLlWFVKBy25K19hsqHEoZGYUlPS5oRPft1-EL6otxKAOZ8AGdUELhxxCVAkO6NTLY3X3bOjL380z75uOXMOO_H--I1L_penS4B6JjcuDSE1Ir80taeu_6lHz20wyK8RyM0H6FLdVuTZcJfZwlSY77ZhRno72tH2grpZ3FalYIiFBsNtygHWSLdBAXUh20lWWxmdM2oG3RYLuhs5SOsJDOll5_AXGKbjWtMaCW162Tcbczave8cgSDNwuUzDxtOVRU2hgWRxIB3QgdBBzSDmNj02wqmxgnDJPaScsizVTCnYshjdNMR3HIz0g1XabunNDICqmVtIqZUARWa6Mi5mJES62FYhekjks6XW1VNqa71bz84_wVOcTXim1bgbwm1ew9dzeQIGT6tvgwvgA9Hrwf
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60HvTkq-LbPXhNbXY3u4m3UltatUWkireSfQRKNS02OdSb_9yZJD4QBG8hDEnYZOab2cz3DSHnPODMRsZ4kIw6T4A3QRwUyms2DeAR16HhSHAeDGXvQVw_BU8VWb3gwjjniuYz18DD4l--nZkct8rAw3HKuVSrZA2AP_BLutbXlgqHYkZGYUWQ85vRRfvx7h4VxjgUgkw0AJ9QROHHGJUCRbqbZPh5_7J5ZNrIM90wb7-kGf_9gFuk_k3Yo3dfULRNVly6QzarDJNW_rvYJe_9NINyPAcjtJ9jU7Vb0FlCryZJkuPOGcXpaM-LS9pKaedlPikkRCi2Gy7RDvJFOowLsQ7ayrLYTGkb8LZosV3SSUpHWEpnM6__ApGKlqrWGFKr69bJQ7czave8agiDN_GVzDxtOdRU2hgWRxIh3Qjt-xwSD2NjEwTKJsYJw6R20rJIM5Vw52JI5DTTURzyPVJLZ6nbJzSyQmolrWImFL7V2qiIuRjxUmuh2AGp45KO56XOxvhzNQ__OH9G1nujwe34tj-8OSIb-IqxicuXx6SWvebuBNKFTJ8WH8kHNlu_aA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Intriguing+Properties+of+Diffusion+Models%3A+An+Empirical+Study+of+the+Natural+Attack+Capability+in+Text-to-Image+Generative+Models&rft.au=Sato%2C+Takami&rft.au=Yue%2C+Justin&rft.au=Chen%2C+Nanze&rft.au=Wang%2C+Ningfei&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=24635&rft.epage=24644&rft_id=info:doi/10.1109%2FCVPR52733.2024.02326&rft.externalDocID=10654867