SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scena...

Full description

Saved in:

Bibliographic Details
Main Authors	Li, Xinfeng, Yang, Yuchen, Deng, Jiangyi, Yan, Chen, Chen, Yanjiao, Ji, Xiaoyu, Xu, Wenyuan
Format	Journal Article
Language	English
Published	09.04.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Cryptography and Security
Online Access	Get full text

Cover

Loading…

Abstract	Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.
AbstractList	Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.
Author	Yang, Yuchen Li, Xinfeng Chen, Yanjiao Ji, Xiaoyu Yan, Chen Xu, Wenyuan Deng, Jiangyi
Author_xml	– sequence: 1 givenname: Xinfeng surname: Li fullname: Li, Xinfeng – sequence: 2 givenname: Yuchen surname: Yang fullname: Yang, Yuchen – sequence: 3 givenname: Jiangyi surname: Deng fullname: Deng, Jiangyi – sequence: 4 givenname: Chen surname: Yan fullname: Yan, Chen – sequence: 5 givenname: Yanjiao surname: Chen fullname: Chen, Yanjiao – sequence: 6 givenname: Xiaoyu surname: Ji fullname: Ji, Xiaoyu – sequence: 7 givenname: Wenyuan surname: Xu fullname: Xu, Wenyuan
BackLink	https://doi.org/10.48550/arXiv.2404.06666$$DView paper in arXiv
BookMark	eNqFjrsOgkAQRbfQwtcHWDk_AK4KxtgSfBQmJtCTjQ5kkmWWLKtZ_l4k9t7mNic5ZypGbBiFWG5kGB3iWK6V9fQOt5GMQrnvNxH3TJV4Rj7CjRxVyhFXkKF_Ka07SH2j6UEOEsMO2UFPou0hw0AMOXoXOBNca1Uh3MwTdTsX41LpFhe_n4nVKc2TSzC4i8ZSrWxXfBuKoWH3n_gALkQ-CA
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by-nc-nd/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by-nc-nd/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2404.06666
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2404_06666
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2404_066663
IEDL.DBID	GOX
IngestDate	Wed Sep 18 12:14:57 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2404_066663
OpenAccessLink	https://arxiv.org/abs/2404.06666
ParticipantIDs	arxiv_primary_2404_06666
PublicationCentury	2000
PublicationDate	2024-04-09
PublicationDateYYYYMMDD	2024-04-09
PublicationDate_xml	– month: 04 year: 2024 text: 2024-04-09 day: 09
PublicationDecade	2020
PublicationYear	2024
Score	3.8283813
SecondaryResourceType	preprint
Snippet	Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Cryptography and Security
Title	SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models
URI	https://arxiv.org/abs/2404.06666
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_Owev0TbdTTfeRGyrUBVaYW_LpptIoKbSRrH_3ny02EuvyRCGCeG9YSZvAK6FL65VIiU8LUuSMMYJF2lGMlq1WSUYVWEYzOiFDd-T5zzNa4CbvzDl4lf_RH1gsbx1cJPceIbN6lCn1LdsDV7zWJwMUlxr-387xzHD0hZI9A9gf83u8D5exyHUpDmCt3Gp5ECaOxzpqGhhPnAcpI5nK_RNcHqqLQadKGMxCkH7eKE2OPGZqZ2Tp0_38NFPLpstj6HVf5w8DEnwofiKghGFd68I7nVPoOHSetkE7Mp2j7FKutTWsQIluOp1Oirjispsyhk9heauU852b53DHnWwG3pL-AU07OJbXjrYtOIqxO4Pvl1xrg
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SafeGen%3A+Mitigating+Sexually+Explicit+Content+Generation+in+Text-to-Image+Models&rft.au=Li%2C+Xinfeng&rft.au=Yang%2C+Yuchen&rft.au=Deng%2C+Jiangyi&rft.au=Yan%2C+Chen&rft.date=2024-04-09&rft_id=info:doi/10.48550%2Farxiv.2404.06666&rft.externalDocID=2404_06666