SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scena...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Text-to-image (T2I) models, such as Stable Diffusion, have exhibited
remarkable performance in generating high-quality images from text descriptions
in recent years. However, text-to-image models may be tricked into generating
not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios.
Existing countermeasures mostly focus on filtering inappropriate inputs and
outputs, or suppressing improper text embeddings, which can block sexually
explicit content (e.g., naked) but may still be vulnerable to adversarial
prompts -- inputs that appear innocent but are ill-intended. In this paper, we
present SafeGen, a framework to mitigate sexual content generation by
text-to-image models in a text-agnostic manner. The key idea is to eliminate
explicit visual representations from the model regardless of the text input. In
this way, the text-to-image model is resistant to adversarial prompts since
such unsafe visual representations are obstructed from within. Extensive
experiments conducted on four datasets and large-scale user studies demonstrate
SafeGen's effectiveness in mitigating sexually explicit content generation
while preserving the high-fidelity of benign images. SafeGen outperforms eight
state-of-the-art baseline methods and achieves 99.4% sexual content removal
performance. Furthermore, our constructed benchmark of adversarial prompts
provides a basis for future development and evaluation of anti-NSFW-generation
methods. |
---|---|
AbstractList | Text-to-image (T2I) models, such as Stable Diffusion, have exhibited
remarkable performance in generating high-quality images from text descriptions
in recent years. However, text-to-image models may be tricked into generating
not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios.
Existing countermeasures mostly focus on filtering inappropriate inputs and
outputs, or suppressing improper text embeddings, which can block sexually
explicit content (e.g., naked) but may still be vulnerable to adversarial
prompts -- inputs that appear innocent but are ill-intended. In this paper, we
present SafeGen, a framework to mitigate sexual content generation by
text-to-image models in a text-agnostic manner. The key idea is to eliminate
explicit visual representations from the model regardless of the text input. In
this way, the text-to-image model is resistant to adversarial prompts since
such unsafe visual representations are obstructed from within. Extensive
experiments conducted on four datasets and large-scale user studies demonstrate
SafeGen's effectiveness in mitigating sexually explicit content generation
while preserving the high-fidelity of benign images. SafeGen outperforms eight
state-of-the-art baseline methods and achieves 99.4% sexual content removal
performance. Furthermore, our constructed benchmark of adversarial prompts
provides a basis for future development and evaluation of anti-NSFW-generation
methods. |
Author | Yang, Yuchen Li, Xinfeng Chen, Yanjiao Ji, Xiaoyu Yan, Chen Xu, Wenyuan Deng, Jiangyi |
Author_xml | – sequence: 1 givenname: Xinfeng surname: Li fullname: Li, Xinfeng – sequence: 2 givenname: Yuchen surname: Yang fullname: Yang, Yuchen – sequence: 3 givenname: Jiangyi surname: Deng fullname: Deng, Jiangyi – sequence: 4 givenname: Chen surname: Yan fullname: Yan, Chen – sequence: 5 givenname: Yanjiao surname: Chen fullname: Chen, Yanjiao – sequence: 6 givenname: Xiaoyu surname: Ji fullname: Ji, Xiaoyu – sequence: 7 givenname: Wenyuan surname: Xu fullname: Xu, Wenyuan |
BackLink | https://doi.org/10.48550/arXiv.2404.06666$$DView paper in arXiv |
BookMark | eNqFjrsOgkAQRbfQwtcHWDk_AK4KxtgSfBQmJtCTjQ5kkmWWLKtZ_l4k9t7mNic5ZypGbBiFWG5kGB3iWK6V9fQOt5GMQrnvNxH3TJV4Rj7CjRxVyhFXkKF_Ka07SH2j6UEOEsMO2UFPou0hw0AMOXoXOBNca1Uh3MwTdTsX41LpFhe_n4nVKc2TSzC4i8ZSrWxXfBuKoWH3n_gALkQ-CA |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by-nc-nd/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by-nc-nd/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2404.06666 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2404_06666 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2404_066663 |
IEDL.DBID | GOX |
IngestDate | Wed Sep 18 12:14:57 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2404_066663 |
OpenAccessLink | https://arxiv.org/abs/2404.06666 |
ParticipantIDs | arxiv_primary_2404_06666 |
PublicationCentury | 2000 |
PublicationDate | 2024-04-09 |
PublicationDateYYYYMMDD | 2024-04-09 |
PublicationDate_xml | – month: 04 year: 2024 text: 2024-04-09 day: 09 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8283813 |
SecondaryResourceType | preprint |
Snippet | Text-to-image (T2I) models, such as Stable Diffusion, have exhibited
remarkable performance in generating high-quality images from text descriptions
in recent... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Cryptography and Security |
Title | SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models |
URI | https://arxiv.org/abs/2404.06666 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_Owev0TbdTTfeRGyrUBVaYW_LpptIoKbSRrH_3ny02EuvyRCGCeG9YSZvAK6FL65VIiU8LUuSMMYJF2lGMlq1WSUYVWEYzOiFDd-T5zzNa4CbvzDl4lf_RH1gsbx1cJPceIbN6lCn1LdsDV7zWJwMUlxr-387xzHD0hZI9A9gf83u8D5exyHUpDmCt3Gp5ECaOxzpqGhhPnAcpI5nK_RNcHqqLQadKGMxCkH7eKE2OPGZqZ2Tp0_38NFPLpstj6HVf5w8DEnwofiKghGFd68I7nVPoOHSetkE7Mp2j7FKutTWsQIluOp1Oirjispsyhk9heauU852b53DHnWwG3pL-AU07OJbXjrYtOIqxO4Pvl1xrg |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SafeGen%3A+Mitigating+Sexually+Explicit+Content+Generation+in+Text-to-Image+Models&rft.au=Li%2C+Xinfeng&rft.au=Yang%2C+Yuchen&rft.au=Deng%2C+Jiangyi&rft.au=Yan%2C+Chen&rft.date=2024-04-09&rft_id=info:doi/10.48550%2Farxiv.2404.06666&rft.externalDocID=2404_06666 |