AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of t...
Saved in:
Published in | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 1316 - 1324 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image. |
---|---|
AbstractList | In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different sub-regions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image. |
Author | He, Xiaodong Xu, Tao Zhang, Han Huang, Xiaolei Zhang, Pengchuan Huang, Qiuyuan Gan, Zhe |
Author_xml | – sequence: 1 givenname: Tao surname: Xu fullname: Xu, Tao – sequence: 2 givenname: Pengchuan surname: Zhang fullname: Zhang, Pengchuan – sequence: 3 givenname: Qiuyuan surname: Huang fullname: Huang, Qiuyuan – sequence: 4 givenname: Han surname: Zhang fullname: Zhang, Han – sequence: 5 givenname: Zhe surname: Gan fullname: Gan, Zhe – sequence: 6 givenname: Xiaolei surname: Huang fullname: Huang, Xiaolei – sequence: 7 givenname: Xiaodong surname: He fullname: He, Xiaodong |
BookMark | eNo9jEtLAzEYRaMoWGvXLtzkD8yY98PdUOxYKFWkui2ZmS8abTOSCa3-eyuKq8O9h3vP0UnsIyB0SUlJKbHX0-eHx5IRakpCqOBHaGK1oZIbpQQj9hiNKFG8UJbaMzQZhjdCCFOGGyFHaF3lHOtqeYNnIUJRJ3dAh1fwmXHu8XzrXgDXECG5HPqI9yG_4sMG4k90m3-3A1x1O0iDS-FQLyHv-_Q-XKBT7zYDTP44Rk-z29X0rljc1_NptSgC1TIXrBUNbYRrmbecG-s5977zsoW2051thW4YAdCskZ53zGvnScNapaDxzHLJx-jq9zcAwPojha1LX2sjtWGC8m81dlfL |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR.2018.00143 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781538664209 1538664208 |
EISSN | 1063-6919 |
EndPage | 1324 |
ExternalDocumentID | 8578241 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i175t-2c4b1b4ac2f93389f33ffdf5cecd7d9c47b20ee72b5f3d2f7af0b2c66ebf29353 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:52:16 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-2c4b1b4ac2f93389f33ffdf5cecd7d9c47b20ee72b5f3d2f7af0b2c66ebf29353 |
PageCount | 9 |
ParticipantIDs | ieee_primary_8578241 |
PublicationCentury | 2000 |
PublicationDate | 2018-Jun |
PublicationDateYYYYMMDD | 2018-06-01 |
PublicationDate_xml | – month: 06 year: 2018 text: 2018-Jun |
PublicationDecade | 2010 |
PublicationTitle | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2018 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0002683845 ssj0003211698 |
Score | 2.630141 |
Snippet | In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1316 |
SubjectTerms | Computational modeling Gallium nitride Generative adversarial networks Generators Image generation Semantics Visualization |
Title | AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks |
URI | https://ieeexplore.ieee.org/document/8578241 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-QkydUMH6nB48OtrZrV2-ECGgCIQYMN0K_EmPcjAwP_vX2bWMa48HburVJ09f1ffT3ew-ha0GZc5bLgFERQbTKBJIqF0Sx8_ooIZIb4A5Ppny8YA_LeNlANzUXxlpbgM9sFx6Lu3yT6S2EynoJ5F4Hlvqed9xKrlYdTyE8oUl1QwZt6j0bLpMqm08Uyt7gafYIWC4ATxYsnR_lVAptMmyhyW4eJYjkpbvNVVd__krR-N-JHqDON28Pz2qNdIgaNj1CrcrQxNVvvGmjVT_P01F_eouH3sgMRlAmwneY-4Ma5xm-f_WHDC4TUoPcMARrsR-zg6fX3z4sLio6b9awj_G0xJRvOmgxvJsPxkFVaSF49uZDHhDNVKTYWhMnvc8qHaXOGRdrq40wUjOhSGitICp21BAn1i5URHNulfP2QkyPUTPNUnuCsA4l4SbWzDtuzPeTWnKhwRGxwm-L5BS1Yb1Wb2UyjVW1VGd_vz5H-yCxEpt1gZr5-9ZeeisgV1eF-L8AaxSxvA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGG8IHvSECsa3PXh0sLVdu3ojRB4KCzFguC30lRjjMDI8-NfbbmMa48HburVJ037r9-jv930AXDNMjNGUewSzwEWrlMexMF4QGquPIsSpctzhSUyHc3K_CBc1cFNxYbTWOfhMt91jfpevVnLjQmWdyOVedyz1Hav3w6Bga1URFUQjHJV3ZK6NrW9DeVTm8wl83uk9TR8dmsvBJ3Oezo-CKrk-6TfAZDuTAkby0t5koi0_fyVp_O9U90Hrm7kHp5VOOgA1nR6CRmlqwvJHXjdB0s2ydNCNb2HfmpnewBWKsB1m9qiG2QqOXu0xA4uU1G7noAvXQjtmC1Cvvn1omNd0Xi-dJMO4QJWvW2Dev5v1hl5Za8F7tgZE5iFJRCDIUiLDrdfKDcbGKBNKLRVTXBImkK81QyI0WCHDlsYXSFKqhbEWQ4iPQD1dpfoYQOlzRFUoiXXdiO3HJadMOldEMysY0QlouvVK3op0Gkm5VKd_v74Cu8PZZJyMR_HDGdhzu1cgtc5BPXvf6AtrE2TiMheFL95YtQU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=AttnGAN%3A+Fine-Grained+Text+to+Image+Generation+with+Attentional+Generative+Adversarial+Networks&rft.au=Xu%2C+Tao&rft.au=Zhang%2C+Pengchuan&rft.au=Huang%2C+Qiuyuan&rft.au=Zhang%2C+Han&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=1316&rft.epage=1324&rft_id=info:doi/10.1109%2FCVPR.2018.00143&rft.externalDocID=8578241 |