SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging. Addressing this, we introduce the SSR-Encoder, a novel architecture designed for selectively capturing any subject from sing...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 8069 - 8078
Main Authors Zhang, Yuxuan, Song, Yiren, Liu, Jiaming, Wang, Rui, Yu, Jinpeng, Tang, Hao, Li, Huaxia, Tang, Xu, Hu, Yao, Pan, Han, Jing, Zhongliang
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging. Addressing this, we introduce the SSR-Encoder, a novel architecture designed for selectively capturing any subject from single or multiple reference images. It responds to various query modalities including text and masks, without necessitating test-time fine-tuning. The SSR-Encoder combines a Token-to-Patch Aligner that aligns query inputs with image patches and a Detail-Preserving Subject Encoder for extracting and preserving fine features of the subjects, thereby generating subject embeddings. These embeddings, used in conjunction with original text embeddings, condition the generation process. Characterized by its model generalizability and efficiency, the SSR-Encoder adapts to a range of custom models and control modules. Enhanced by the Embedding Consistency Regularization Loss for improved training, our extensive experiments demonstrate its effectiveness in versatile and high-quality image generation, indicating its broad applicability. Project page: ssr-encoder.github.io
AbstractList Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging. Addressing this, we introduce the SSR-Encoder, a novel architecture designed for selectively capturing any subject from single or multiple reference images. It responds to various query modalities including text and masks, without necessitating test-time fine-tuning. The SSR-Encoder combines a Token-to-Patch Aligner that aligns query inputs with image patches and a Detail-Preserving Subject Encoder for extracting and preserving fine features of the subjects, thereby generating subject embeddings. These embeddings, used in conjunction with original text embeddings, condition the generation process. Characterized by its model generalizability and efficiency, the SSR-Encoder adapts to a range of custom models and control modules. Enhanced by the Embedding Consistency Regularization Loss for improved training, our extensive experiments demonstrate its effectiveness in versatile and high-quality image generation, indicating its broad applicability. Project page: ssr-encoder.github.io
Author Song, Yiren
Tang, Xu
Yu, Jinpeng
Hu, Yao
Li, Huaxia
Zhang, Yuxuan
Liu, Jiaming
Tang, Hao
Wang, Rui
Pan, Han
Jing, Zhongliang
Author_xml – sequence: 1
  givenname: Yuxuan
  surname: Zhang
  fullname: Zhang, Yuxuan
  email: zyx153@sjtu.edu.cn
  organization: Shanghai Jiao Tong University
– sequence: 2
  givenname: Yiren
  surname: Song
  fullname: Song, Yiren
  email: yiren@nus.edu.sg
  organization: National University of Singapore
– sequence: 3
  givenname: Jiaming
  surname: Liu
  fullname: Liu, Jiaming
  email: jmliu1217@gmail.com
  organization: Xiaohongshu Inc
– sequence: 4
  givenname: Rui
  surname: Wang
  fullname: Wang, Rui
  email: wr_bupt@bupt.edu.cn
  organization: Beijing University of Posts and Telecommunications
– sequence: 5
  givenname: Jinpeng
  surname: Yu
  fullname: Yu, Jinpeng
  email: yujp1@shanghaitech.edu.cn
  organization: ShanghaiTech University
– sequence: 6
  givenname: Hao
  surname: Tang
  fullname: Tang, Hao
  email: haotang2@cmu.edu
  organization: Carnegie Mellon University
– sequence: 7
  givenname: Huaxia
  surname: Li
  fullname: Li, Huaxia
  email: xiahou@xiaohongshu.com
  organization: Xiaohongshu Inc
– sequence: 8
  givenname: Xu
  surname: Tang
  fullname: Tang, Xu
  email: tangshen@xiaohongshu.com
  organization: Xiaohongshu Inc
– sequence: 9
  givenname: Yao
  surname: Hu
  fullname: Hu, Yao
  email: yicheng@xiaohongshu.com
  organization: Xiaohongshu Inc
– sequence: 10
  givenname: Han
  surname: Pan
  fullname: Pan, Han
  email: hanpan@sjtu.edu.cn
  organization: Shanghai Jiao Tong University
– sequence: 11
  givenname: Zhongliang
  surname: Jing
  fullname: Jing, Zhongliang
  email: zljing@sjtu.edu.cn
  organization: Shanghai Jiao Tong University
BookMark eNo1j11LwzAYhaMoOGf_wS7yB1rfJG0-vJO6TWGgtNPbkbZvJGOmI62C_94y9eoceB4OnGtyEfqAhCwYZIyBuS3fXqqCKyEyDjzPAJRiZyQxymhRgCgEgDwnMwZSpNIwc0WSYdgDgOCMSaNnZFvXVboMbd9hvKOn4sM7rfGA7ei_kNafzX6qtMJjxAHDaEffB-r6-I_ShziJga4xYDzRG3Lp7GHA5C_n5HW13JaP6eZ5_VTeb1LPlBxTznVjeN45I0Fa1wGKnLfKWWmbnIPVrUSmiw5zw7mw2hipJ6XhrUM9ITEni99dj4i7Y_QfNn7vpreFAqXFD30YU_A
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52733.2024.00771
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350353006
EISSN 1063-6919
EndPage 8078
ExternalDocumentID 10657078
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i176t-228b924df9606afd0e342c7fa6ab420a8c6e185de49223a89968e34b2cfe86e13
IEDL.DBID RIE
IngestDate Wed Aug 27 01:55:19 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-228b924df9606afd0e342c7fa6ab420a8c6e185de49223a89968e34b2cfe86e13
PageCount 10
ParticipantIDs ieee_primary_10657078
PublicationCentury 2000
PublicationDate 2024-June-16
PublicationDateYYYYMMDD 2024-06-16
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-16
  day: 16
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.5165398
Snippet Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations...
SourceID ieee
SourceType Publisher
StartPage 8069
SubjectTerms Adaptation models
Computer vision
Ecosystems
Feature extraction
Image coding
Image synthesis
Training
Title SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
URI https://ieeexplore.ieee.org/document/10657078
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uTz7Ny8Q7efA1s03SJPV1bgzBMXaRvY20TUCETmb34q_3nLTbQBB8C02gJSHn1u_7DiEPEPSmmnvPtMwiJhNvWRpAjlmRyBxx7x7ZyK9jNVrIl2WybMjqgQvjnAvgM9fDYfiXX6zzLZbK4IYjUEObFmlB5laTtfYFFQGpjEpNQ4-Lo_Sx_zaZor6YgDSQB5HsQJQ_NFEJPmTYIePd22voyEdvW2W9_PuXMOO_P--EdA90PTrZO6JTcuTKM9Jp4kva3N6vczKfzaZsUCKLffNEwwDW01lohQNWj4IVwbIMnQZ4bMNKKinEtbsp9rxB80hrtWqc7ZLFcDDvj1jTVYG9x1pVjHOTQdJVeMxdrC8iJyTPtbfKZpJH1uTKgRMvnEwhdLCQjykDSzKee2dgSlyQdrku3SWhwnJf6NQLIWKp88Q4U0gskaBGjHH8inRxl1aftXDGardB1388vyHHeFKIxIrVLWlXm627A59fZffhrH8ANFCq2w
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1JS8NAFH7UetBTXSruzkGPqclkkkwET12odqF0kd5qlhkQIZU2RfS_-Ff8bb6ZpC0IHgvehswQmLzJ2-b73gO4RqfX96iUhsdC02CODAxfgxzD2GGRwr1LxUbudN3miD2OnXEBvlZcGCGEBp-Jihrqu_x4Gi1Uqgz_cAXU8HiOoWyJj3eM0Ob3DzUU5w2ljfqw2jTyJgLGi-W5qUEpDzHGiKVy1QMZm8JmNPJk4AYho2bAI1egzYoF89FSBhh-uByXhDSSguOUje_dgm10NBya0cNWKRwbgyfX5zkhzzL92-pTr68qmtkYeFJdlltT89dtW7TVapTge7nfDKzyWlmkYSX6_FUK8t9-kD0orwmJpLcytftQEMkBlHIPmuT6aX4Iw8Ggb9QTxdOf3RE9wPVkoJv9oF4nqCdV4on0NQA4510lBD335ZRRmykDQLJ63Gq2DKON7PAIisk0EcdA7IDK2POlbdsW8yKHCx4zlQRSVXC4oCdQVlKZvGWlQSZLgZz-8fwKdprDTnvSfui2zmBXnRKFO7Pccyims4W4QA8nDS_1OSPwvGk5_gBvhwhU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=SSR-Encoder%3A+Encoding+Selective+Subject+Representation+for+Subject-Driven+Generation&rft.au=Zhang%2C+Yuxuan&rft.au=Song%2C+Yiren&rft.au=Liu%2C+Jiaming&rft.au=Wang%2C+Rui&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8069&rft.epage=8078&rft_id=info:doi/10.1109%2FCVPR52733.2024.00771&rft.externalDocID=10657078