SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scena...

Full description

Saved in:
Bibliographic Details
Main Authors Li, Xinfeng, Yang, Yuchen, Deng, Jiangyi, Yan, Chen, Chen, Yanjiao, Ji, Xiaoyu, Xu, Wenyuan
Format Journal Article
LanguageEnglish
Published 09.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.
AbstractList Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexually explicit scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block sexually explicit content (e.g., naked) but may still be vulnerable to adversarial prompts -- inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate sexual content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate explicit visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since such unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets and large-scale user studies demonstrate SafeGen's effectiveness in mitigating sexually explicit content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.4% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.
Author Yang, Yuchen
Li, Xinfeng
Chen, Yanjiao
Ji, Xiaoyu
Yan, Chen
Xu, Wenyuan
Deng, Jiangyi
Author_xml – sequence: 1
  givenname: Xinfeng
  surname: Li
  fullname: Li, Xinfeng
– sequence: 2
  givenname: Yuchen
  surname: Yang
  fullname: Yang, Yuchen
– sequence: 3
  givenname: Jiangyi
  surname: Deng
  fullname: Deng, Jiangyi
– sequence: 4
  givenname: Chen
  surname: Yan
  fullname: Yan, Chen
– sequence: 5
  givenname: Yanjiao
  surname: Chen
  fullname: Chen, Yanjiao
– sequence: 6
  givenname: Xiaoyu
  surname: Ji
  fullname: Ji, Xiaoyu
– sequence: 7
  givenname: Wenyuan
  surname: Xu
  fullname: Xu, Wenyuan
BackLink https://doi.org/10.48550/arXiv.2404.06666$$DView paper in arXiv
BookMark eNqFjrsOgkAQRbfQwtcHWDk_AK4KxtgSfBQmJtCTjQ5kkmWWLKtZ_l4k9t7mNic5ZypGbBiFWG5kGB3iWK6V9fQOt5GMQrnvNxH3TJV4Rj7CjRxVyhFXkKF_Ka07SH2j6UEOEsMO2UFPou0hw0AMOXoXOBNca1Uh3MwTdTsX41LpFhe_n4nVKc2TSzC4i8ZSrWxXfBuKoWH3n_gALkQ-CA
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by-nc-nd/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by-nc-nd/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2404.06666
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2404_06666
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2404_066663
IEDL.DBID GOX
IngestDate Wed Sep 18 12:14:57 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2404_066663
OpenAccessLink https://arxiv.org/abs/2404.06666
ParticipantIDs arxiv_primary_2404_06666
PublicationCentury 2000
PublicationDate 2024-04-09
PublicationDateYYYYMMDD 2024-04-09
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-09
  day: 09
PublicationDecade 2020
PublicationYear 2024
Score 3.8283813
SecondaryResourceType preprint
Snippet Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Cryptography and Security
Title SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models
URI https://arxiv.org/abs/2404.06666
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_Owev0TbdTTfeRGyrUBVaYW_LpptIoKbSRrH_3ny02EuvyRCGCeG9YSZvAK6FL65VIiU8LUuSMMYJF2lGMlq1WSUYVWEYzOiFDd-T5zzNa4CbvzDl4lf_RH1gsbx1cJPceIbN6lCn1LdsDV7zWJwMUlxr-387xzHD0hZI9A9gf83u8D5exyHUpDmCt3Gp5ECaOxzpqGhhPnAcpI5nK_RNcHqqLQadKGMxCkH7eKE2OPGZqZ2Tp0_38NFPLpstj6HVf5w8DEnwofiKghGFd68I7nVPoOHSetkE7Mp2j7FKutTWsQIluOp1Oirjispsyhk9heauU852b53DHnWwG3pL-AU07OJbXjrYtOIqxO4Pvl1xrg
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SafeGen%3A+Mitigating+Sexually+Explicit+Content+Generation+in+Text-to-Image+Models&rft.au=Li%2C+Xinfeng&rft.au=Yang%2C+Yuchen&rft.au=Deng%2C+Jiangyi&rft.au=Yan%2C+Chen&rft.date=2024-04-09&rft_id=info:doi/10.48550%2Farxiv.2404.06666&rft.externalDocID=2404_06666