AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of t...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 1316 - 1324
Main Authors Xu, Tao, Zhang, Pengchuan, Huang, Qiuyuan, Zhang, Han, Gan, Zhe, Huang, Xiaolei, He, Xiaodong
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
AbstractList In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
Author He, Xiaodong
Xu, Tao
Zhang, Han
Huang, Xiaolei
Zhang, Pengchuan
Huang, Qiuyuan
Gan, Zhe
Author_xml – sequence: 1
  givenname: Tao
  surname: Xu
  fullname: Xu, Tao
– sequence: 2
  givenname: Pengchuan
  surname: Zhang
  fullname: Zhang, Pengchuan
– sequence: 3
  givenname: Qiuyuan
  surname: Huang
  fullname: Huang, Qiuyuan
– sequence: 4
  givenname: Han
  surname: Zhang
  fullname: Zhang, Han
– sequence: 5
  givenname: Zhe
  surname: Gan
  fullname: Gan, Zhe
– sequence: 6
  givenname: Xiaolei
  surname: Huang
  fullname: Huang, Xiaolei
– sequence: 7
  givenname: Xiaodong
  surname: He
  fullname: He, Xiaodong
BookMark eNo9jEtLAzEYRaMoWGvXLtzkD8yY98PdUOxYKFWkui2ZmS8abTOSCa3-eyuKq8O9h3vP0UnsIyB0SUlJKbHX0-eHx5IRakpCqOBHaGK1oZIbpQQj9hiNKFG8UJbaMzQZhjdCCFOGGyFHaF3lHOtqeYNnIUJRJ3dAh1fwmXHu8XzrXgDXECG5HPqI9yG_4sMG4k90m3-3A1x1O0iDS-FQLyHv-_Q-XKBT7zYDTP44Rk-z29X0rljc1_NptSgC1TIXrBUNbYRrmbecG-s5977zsoW2051thW4YAdCskZ53zGvnScNapaDxzHLJx-jq9zcAwPojha1LX2sjtWGC8m81dlfL
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00143
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 1324
ExternalDocumentID 8578241
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i175t-2c4b1b4ac2f93389f33ffdf5cecd7d9c47b20ee72b5f3d2f7af0b2c66ebf29353
IEDL.DBID RIE
IngestDate Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-2c4b1b4ac2f93389f33ffdf5cecd7d9c47b20ee72b5f3d2f7af0b2c66ebf29353
PageCount 9
ParticipantIDs ieee_primary_8578241
PublicationCentury 2000
PublicationDate 2018-Jun
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-Jun
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.630141
Snippet In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained...
SourceID ieee
SourceType Publisher
StartPage 1316
SubjectTerms Computational modeling
Gallium nitride
Generative adversarial networks
Generators
Image generation
Semantics
Visualization
Title AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
URI https://ieeexplore.ieee.org/document/8578241
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-QkydUMH6nB48OtrZrV2-ECGgCIQYMN0K_EmPcjAwP_vX2bWMa48HburVJ09f1ffT3ew-ha0GZc5bLgFERQbTKBJIqF0Sx8_ooIZIb4A5Ppny8YA_LeNlANzUXxlpbgM9sFx6Lu3yT6S2EynoJ5F4Hlvqed9xKrlYdTyE8oUl1QwZt6j0bLpMqm08Uyt7gafYIWC4ATxYsnR_lVAptMmyhyW4eJYjkpbvNVVd__krR-N-JHqDON28Pz2qNdIgaNj1CrcrQxNVvvGmjVT_P01F_eouH3sgMRlAmwneY-4Ma5xm-f_WHDC4TUoPcMARrsR-zg6fX3z4sLio6b9awj_G0xJRvOmgxvJsPxkFVaSF49uZDHhDNVKTYWhMnvc8qHaXOGRdrq40wUjOhSGitICp21BAn1i5URHNulfP2QkyPUTPNUnuCsA4l4SbWzDtuzPeTWnKhwRGxwm-L5BS1Yb1Wb2UyjVW1VGd_vz5H-yCxEpt1gZr5-9ZeeisgV1eF-L8AaxSxvA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGG8IHvSECsa3PXh0sLVdu3ojRB4KCzFguC30lRjjMDI8-NfbbmMa48HburVJ037r9-jv930AXDNMjNGUewSzwEWrlMexMF4QGquPIsSpctzhSUyHc3K_CBc1cFNxYbTWOfhMt91jfpevVnLjQmWdyOVedyz1Hav3w6Bga1URFUQjHJV3ZK6NrW9DeVTm8wl83uk9TR8dmsvBJ3Oezo-CKrk-6TfAZDuTAkby0t5koi0_fyVp_O9U90Hrm7kHp5VOOgA1nR6CRmlqwvJHXjdB0s2ydNCNb2HfmpnewBWKsB1m9qiG2QqOXu0xA4uU1G7noAvXQjtmC1Cvvn1omNd0Xi-dJMO4QJWvW2Dev5v1hl5Za8F7tgZE5iFJRCDIUiLDrdfKDcbGKBNKLRVTXBImkK81QyI0WCHDlsYXSFKqhbEWQ4iPQD1dpfoYQOlzRFUoiXXdiO3HJadMOldEMysY0QlouvVK3op0Gkm5VKd_v74Cu8PZZJyMR_HDGdhzu1cgtc5BPXvf6AtrE2TiMheFL95YtQU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=AttnGAN%3A+Fine-Grained+Text+to+Image+Generation+with+Attentional+Generative+Adversarial+Networks&rft.au=Xu%2C+Tao&rft.au=Zhang%2C+Pengchuan&rft.au=Huang%2C+Qiuyuan&rft.au=Zhang%2C+Han&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=1316&rft.epage=1324&rft_id=info:doi/10.1109%2FCVPR.2018.00143&rft.externalDocID=8578241