Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. Ho...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 24635 - 24644
Main Authors	Sato, Takami, Yue, Justin, Chen, Nanze, Wang, Ningfei, Chen, Qi Alfred
Format	Conference Proceeding
Language	English
Published	IEEE 16.06.2024
Subjects	Adversarial Attack Artificial neural networks Autonomous Driving Diffusion Model Diffusion models Feature extraction Noise reduction Predictive models Safety Security Text to image Visual systems
Online Access	Get full text

Cover

Loading…

Abstract	Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: we identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.
AbstractList	Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: we identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.
Author	Sato, Takami Chen, Nanze Yue, Justin Wang, Ningfei Chen, Qi Alfred
Author_xml	– sequence: 1 givenname: Takami surname: Sato fullname: Sato, Takami email: takamis@uci.edu organization: University of California,Irvine – sequence: 2 givenname: Justin surname: Yue fullname: Yue, Justin email: jpyue@uci.edu organization: University of California,Irvine – sequence: 3 givenname: Nanze surname: Chen fullname: Chen, Nanze email: ningfei.wang@uci.edu organization: University of Cambridge – sequence: 4 givenname: Ningfei surname: Wang fullname: Wang, Ningfei email: alfchen@uci.edu organization: University of California,Irvine – sequence: 5 givenname: Qi Alfred surname: Chen fullname: Chen, Qi Alfred email: nc630@cam.ac.uk organization: University of California,Irvine
BookMark	eNotkN9OwjAcRqvRRETegIu-wLB_tnb1jkxEElSi6C1pu9-wOrql64xc-uZi5OpLTk7OxXeJznzjAaExJRNKibou3lbPGZOcTxhh6YQwzsQJGimpcp4RnnFCxCkaUCJ4IhRVF2jUdR-EEM4oFSofoJ-Fj8Fte-e3eBWaFkJ00OGmwreuqvrONR4_NCXU3Q2eejzbtS44q2v8Evty_-fFd8CPOvbhAKcxavuJC91q42oX99h5vIbvmMQmWez0FvAcPAQd3Rccu1fovNJ1B6PjDtHr3Wxd3CfLp_mimC4TR6WIiSl5LqmxlmklDCHMpoZSLlNuS22zTJaVhdQyYUCUTBkmKw6g84waZpTO-RCN_7sOADZtcDsd9pvDNVmaC8l_ARCiZN4
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR52733.2024.02326
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9798350353006
EISSN	1063-6919
EndPage	24644
ExternalDocumentID	10654867
Genre	orig-research
GrantInformation_xml	– fundername: NSF grantid: CNS-2145493,CNS-1929771,CNS-1932464 funderid: 10.13039/100000001
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i176t-bd3871bcc2a96b002c4b113743cdac557dfce4c26be6d29b27f3eea851b2b9a83
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:00:48 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i176t-bd3871bcc2a96b002c4b113743cdac557dfce4c26be6d29b27f3eea851b2b9a83
PageCount	10
ParticipantIDs	ieee_primary_10654867
PublicationCentury	2000
PublicationDate	2024-June-16
PublicationDateYYYYMMDD	2024-06-16
PublicationDate_xml	– month: 06 year: 2024 text: 2024-June-16 day: 16
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.2930274
Snippet	Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the...
SourceID	ieee
SourceType	Publisher
StartPage	24635
SubjectTerms	Adversarial Attack Artificial neural networks Autonomous Driving Diffusion Model Diffusion models Feature extraction Noise reduction Predictive models Safety Security Text to image Visual systems
Title	Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
URI	https://ieeexplore.ieee.org/document/10654867
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1La8JAEF6qp57sw9I3e-g1qdldd5PexCpaqEjR4k2yj4BYo9TkYG_9552J0ZZCobcQhiRsMvPNbOb7hpA73uTMRsZ4kIw6T4A3QRwUyms0DOAR16HhSHB-HsjeWDxNmpOSrF5wYZxzRfOZ8_Gw-JdvlybHrTLwcJxyLlWFVKBy25K19hsqHEoZGYUlPS5oRPft1-EL6otxKAOZ8AGdUELhxxCVAkO6NTLY3X3bOjL380z75uOXMOO_H--I1L_penS4B6JjcuDSE1Ir80taeu_6lHz20wyK8RyM0H6FLdVuTZcJfZwlSY77ZhRno72tH2grpZ3FalYIiFBsNtygHWSLdBAXUh20lWWxmdM2oG3RYLuhs5SOsJDOll5_AXGKbjWtMaCW162Tcbczave8cgSDNwuUzDxtOVRU2hgWRxIB3QgdBBzSDmNj02wqmxgnDJPaScsizVTCnYshjdNMR3HIz0g1XabunNDICqmVtIqZUARWa6Mi5mJES62FYhekjks6XW1VNqa71bz84_wVOcTXim1bgbwm1ew9dzeQIGT6tvgwvgA9Hrwf
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60HvTkq-LbPXhNbXY3u4m3UltatUWkireSfQRKNS02OdSb_9yZJD4QBG8hDEnYZOab2cz3DSHnPODMRsZ4kIw6T4A3QRwUyms2DeAR16HhSHAeDGXvQVw_BU8VWb3gwjjniuYz18DD4l--nZkct8rAw3HKuVSrZA2AP_BLutbXlgqHYkZGYUWQ85vRRfvx7h4VxjgUgkw0AJ9QROHHGJUCRbqbZPh5_7J5ZNrIM90wb7-kGf_9gFuk_k3Yo3dfULRNVly6QzarDJNW_rvYJe_9NINyPAcjtJ9jU7Vb0FlCryZJkuPOGcXpaM-LS9pKaedlPikkRCi2Gy7RDvJFOowLsQ7ayrLYTGkb8LZosV3SSUpHWEpnM6__ApGKlqrWGFKr69bJQ7czave8agiDN_GVzDxtOdRU2hgWRxIh3Qjt-xwSD2NjEwTKJsYJw6R20rJIM5Vw52JI5DTTURzyPVJLZ6nbJzSyQmolrWImFL7V2qiIuRjxUmuh2AGp45KO56XOxvhzNQ__OH9G1nujwe34tj-8OSIb-IqxicuXx6SWvebuBNKFTJ8WH8kHNlu_aA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Intriguing+Properties+of+Diffusion+Models%3A+An+Empirical+Study+of+the+Natural+Attack+Capability+in+Text-to-Image+Generative+Models&rft.au=Sato%2C+Takami&rft.au=Yue%2C+Justin&rft.au=Chen%2C+Nanze&rft.au=Wang%2C+Ningfei&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=24635&rft.epage=24644&rft_id=info:doi/10.1109%2FCVPR52733.2024.02326&rft.externalDocID=10654867