Task-Adaptive Attention for Image Captioning

Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we prop...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 43 - 51
Main Authors Yan, Chenggang, Hao, Yiming, Li, Liang, Yin, Jian, Liu, Anan, Mao, Zhendong, Chen, Zhenyu, Gao, Xingyu
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we propose Task-Adaptive Attention module for image captioning, which can alleviate this misleading problem and learn implicit non-visual clues which can be helpful for the generation of non-visual words. We further introduce a diversity regularization to enhance the expression ability of the Task-Adaptive Attention module. Extensive experiments on the MSCOCO captioning dataset demonstrate that by plugging our Task-Adaptive Attention module into a vanilla Transformer-based image captioning model, performance improvement can be achieved.
AbstractList Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could mislead the word generation. In this paper, we propose Task-Adaptive Attention module for image captioning, which can alleviate this misleading problem and learn implicit non-visual clues which can be helpful for the generation of non-visual words. We further introduce a diversity regularization to enhance the expression ability of the Task-Adaptive Attention module. Extensive experiments on the MSCOCO captioning dataset demonstrate that by plugging our Task-Adaptive Attention module into a vanilla Transformer-based image captioning model, performance improvement can be achieved.
Author Yin, Jian
Gao, Xingyu
Li, Liang
Chen, Zhenyu
Hao, Yiming
Yan, Chenggang
Liu, Anan
Mao, Zhendong
Author_xml – sequence: 1
  givenname: Chenggang
  orcidid: 0000-0003-1204-0512
  surname: Yan
  fullname: Yan, Chenggang
  email: cgyan@sdu.edu.cn
  organization: School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China
– sequence: 2
  givenname: Yiming
  orcidid: 0000-0003-3225-5887
  surname: Hao
  fullname: Hao, Yiming
  email: m17863135918_1@163.com
  organization: School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China
– sequence: 3
  givenname: Liang
  orcidid: 0000-0001-8437-4824
  surname: Li
  fullname: Li, Liang
  email: liang.li@ict.ac.cn
  organization: Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
– sequence: 4
  givenname: Jian
  orcidid: 0000-0002-4820-0226
  surname: Yin
  fullname: Yin, Jian
  email: yinjian@sdu.edu.cn
  organization: School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China
– sequence: 5
  givenname: Anan
  orcidid: 0000-0001-5755-9145
  surname: Liu
  fullname: Liu, Anan
  email: anan0422@gmail.com
  organization: School of Electrical and Information Engineering, Tianjin University, Tianjin, China
– sequence: 6
  givenname: Zhendong
  orcidid: 0000-0001-5739-8126
  surname: Mao
  fullname: Mao, Zhendong
  email: zdmao@ustc.edu.cn
  organization: School of Information Science and Technology, University of Science and Technology of China, Hefei, China
– sequence: 7
  givenname: Zhenyu
  surname: Chen
  fullname: Chen, Zhenyu
  email: czy9907@gmail.com
  organization: Big Data Center, State Grid Corporation of China, Beijing, China
– sequence: 8
  givenname: Xingyu
  surname: Gao
  fullname: Gao, Xingyu
  email: gaoxingyu@ime.ac.cn
  organization: Chinese Academy of Sciences, Institute of Microelectronics, Beijing, China
BookMark eNp9kD1PwzAQhi1UJNrCH4AlEispPn_FHquIQqVKDARWy4mdyqVNiuMi8e9JaMXAwHQn3fvc6Z4JGjVt4xC6BjwDwOq-yF_eihnBBGYUi4wxdYbGwLlMCcF81PeYQyoJ8As06boNxsAky8borjDdezq3Zh_9p0vmMbom-rZJ6jYky51ZuyQfZm3jm_UlOq_NtnNXpzpFr4uHIn9KV8-Py3y-SiuieExLQ0uuGDNEWmEJgKGqZhWlSjLOlLBKliV3nEhuCNSVLCmrhGAWU1sqZukU3R737kP7cXBd1Jv2EJr-pCYCFCYMMtmnyDFVhbbrgqv1PvidCV8asB6s6B8rerCiT1Z6SP6BKh_N8F8Mxm__R2-OqHfO_d5SVILMBP0GGe9wEw
CODEN ITCTEM
CitedBy_id crossref_primary_10_1109_TCSVT_2022_3183648
crossref_primary_10_1016_j_eswa_2022_117174
crossref_primary_10_1145_3648370
crossref_primary_10_1002_mp_17128
crossref_primary_10_1016_j_neucom_2023_126914
crossref_primary_10_3233_JIFS_233004
crossref_primary_10_1016_j_neucom_2024_128824
crossref_primary_10_1007_s00530_021_00873_8
crossref_primary_10_1109_TCSVT_2023_3282349
crossref_primary_10_1109_TMM_2023_3242142
crossref_primary_10_1109_TIM_2024_3353830
crossref_primary_10_1016_j_measurement_2025_117105
crossref_primary_10_1109_TCSVT_2024_3391415
crossref_primary_10_1016_j_engappai_2024_109105
crossref_primary_10_1186_s13640_022_00583_9
crossref_primary_10_1016_j_jvcir_2023_103897
crossref_primary_10_1109_TNNLS_2023_3283239
crossref_primary_10_1007_s00530_021_00862_x
crossref_primary_10_1007_s11042_024_18467_7
crossref_primary_10_1016_j_neucom_2024_128036
crossref_primary_10_1109_TNNLS_2023_3263846
crossref_primary_10_1016_j_ins_2024_121342
crossref_primary_10_1016_j_engappai_2023_106384
crossref_primary_10_1007_s12145_024_01479_0
crossref_primary_10_1007_s00138_023_01479_y
crossref_primary_10_1016_j_eswa_2025_126850
crossref_primary_10_1007_s11760_023_02769_8
crossref_primary_10_1109_TNNLS_2024_3358858
crossref_primary_10_1109_JAS_2022_105734
crossref_primary_10_3390_app122111280
crossref_primary_10_1109_TCSVT_2023_3296889
crossref_primary_10_1109_TIP_2022_3159472
crossref_primary_10_1016_j_neucom_2022_06_062
crossref_primary_10_32604_cmc_2024_054841
crossref_primary_10_1186_s13640_021_00564_4
crossref_primary_10_1109_TCSVT_2024_3382717
crossref_primary_10_1186_s13640_023_00608_x
crossref_primary_10_1109_TIM_2023_3285981
crossref_primary_10_1109_TCSVT_2024_3414275
crossref_primary_10_1109_TCSVT_2022_3211734
crossref_primary_10_1186_s13640_024_00627_2
crossref_primary_10_1016_j_displa_2024_102941
crossref_primary_10_1016_j_engappai_2024_109134
crossref_primary_10_1016_j_ins_2023_119992
crossref_primary_10_1109_TCSVT_2023_3336371
crossref_primary_10_1109_TCSVT_2022_3181490
crossref_primary_10_1007_s00530_022_00976_w
crossref_primary_10_1016_j_measurement_2024_114545
crossref_primary_10_1109_TCSVT_2023_3291379
crossref_primary_10_3390_s21237982
crossref_primary_10_1016_j_optlastec_2022_108469
crossref_primary_10_1109_TIM_2024_3374320
crossref_primary_10_1109_TMM_2024_3358948
crossref_primary_10_3390_insects14010054
crossref_primary_10_7717_peerj_cs_2725
crossref_primary_10_1145_3672397
crossref_primary_10_1109_TCSVT_2024_3465445
crossref_primary_10_1007_s00530_023_01172_0
crossref_primary_10_1016_j_neucom_2024_127530
crossref_primary_10_1016_j_neucom_2025_129710
crossref_primary_10_1007_s00138_023_01370_w
crossref_primary_10_1007_s10489_024_05389_y
crossref_primary_10_1111_coin_12653
crossref_primary_10_1016_j_ipm_2023_103288
crossref_primary_10_1016_j_micpro_2023_104931
crossref_primary_10_1109_TCSVT_2022_3221755
crossref_primary_10_1186_s13640_022_00589_3
crossref_primary_10_1109_TCSVT_2022_3233369
crossref_primary_10_3390_rs15051395
crossref_primary_10_1109_LSP_2021_3088323
crossref_primary_10_1109_TCSVT_2022_3207228
crossref_primary_10_1109_TCI_2024_3426975
crossref_primary_10_1145_3612926
crossref_primary_10_1007_s13042_023_01876_9
crossref_primary_10_1109_TCSVT_2024_3405998
crossref_primary_10_1049_ipr2_12790
crossref_primary_10_1109_TCSVT_2022_3225549
crossref_primary_10_1016_j_cviu_2024_104088
crossref_primary_10_1007_s11227_022_04594_1
crossref_primary_10_1109_TCSVT_2021_3104932
crossref_primary_10_1186_s40537_023_00693_9
crossref_primary_10_1007_s10489_022_03675_1
crossref_primary_10_1002_eng2_12785
crossref_primary_10_1007_s11831_024_10190_8
crossref_primary_10_1016_j_image_2024_117153
crossref_primary_10_1109_TCSVT_2023_3343520
crossref_primary_10_1016_j_measurement_2023_113240
crossref_primary_10_1109_TMM_2022_3177308
crossref_primary_10_1016_j_cropro_2024_107018
crossref_primary_10_1109_TCSVT_2022_3182426
crossref_primary_10_3390_rs16122161
crossref_primary_10_1007_s00530_021_00859_6
crossref_primary_10_1016_j_jvcir_2022_103628
crossref_primary_10_1007_s00530_021_00863_w
crossref_primary_10_1016_j_jvcir_2022_103740
crossref_primary_10_1016_j_measurement_2024_116385
crossref_primary_10_1109_TCSVT_2022_3178844
crossref_primary_10_1109_TMM_2023_3301279
crossref_primary_10_1109_TMM_2022_3202690
crossref_primary_10_1109_TCSVT_2023_3235704
crossref_primary_10_1016_j_jksuci_2024_102127
crossref_primary_10_1007_s11042_024_19410_6
crossref_primary_10_1016_j_engappai_2023_107732
crossref_primary_10_1016_j_engappai_2023_106406
crossref_primary_10_1109_TCSVT_2024_3358411
crossref_primary_10_1016_j_neucom_2022_07_048
crossref_primary_10_1109_LSP_2022_3177319
crossref_primary_10_1109_TMM_2023_3283878
crossref_primary_10_1109_TCSVT_2024_3497997
crossref_primary_10_1109_TCSVT_2022_3193857
crossref_primary_10_1016_j_ins_2023_119810
crossref_primary_10_3390_app14104231
crossref_primary_10_1016_j_knosys_2025_113127
crossref_primary_10_1016_j_jvcir_2022_103641
crossref_primary_10_1016_j_eswa_2025_126692
crossref_primary_10_3390_data7060080
crossref_primary_10_1016_j_engappai_2025_110358
crossref_primary_10_1016_j_neucom_2022_06_118
crossref_primary_10_1016_j_autcon_2024_105286
crossref_primary_10_1109_TCSVT_2024_3426655
crossref_primary_10_1016_j_dsp_2025_105155
crossref_primary_10_1016_j_measurement_2023_113714
crossref_primary_10_1109_TMI_2024_3507073
crossref_primary_10_1016_j_jvcir_2023_104019
crossref_primary_10_1007_s13042_023_02075_2
crossref_primary_10_1016_j_cviu_2024_104165
crossref_primary_10_1016_j_engappai_2025_110125
crossref_primary_10_1016_j_displa_2024_102798
crossref_primary_10_1109_TCSVT_2023_3281671
crossref_primary_10_1016_j_jvcir_2022_103672
crossref_primary_10_1109_TCSVT_2022_3155795
crossref_primary_10_1109_TCSVT_2022_3189242
crossref_primary_10_1016_j_autcon_2024_105298
crossref_primary_10_1016_j_entcom_2022_100511
crossref_primary_10_1007_s11277_024_11653_8
crossref_primary_10_1109_LSP_2023_3292028
crossref_primary_10_1109_TGRS_2023_3236154
crossref_primary_10_1016_j_dsp_2023_103987
crossref_primary_10_1016_j_jvcir_2023_103832
crossref_primary_10_1186_s13640_022_00588_4
crossref_primary_10_1016_j_jvcir_2023_103954
crossref_primary_10_1016_j_jvcir_2023_103836
crossref_primary_10_3389_fonc_2023_1115718
crossref_primary_10_1007_s00530_021_00841_2
crossref_primary_10_1109_TCSVT_2022_3199603
crossref_primary_10_1016_j_engappai_2024_108322
crossref_primary_10_1016_j_neucom_2024_127246
crossref_primary_10_1016_j_neucom_2024_127488
crossref_primary_10_1109_TCSVT_2023_3307554
crossref_primary_10_1016_j_neucom_2025_129683
crossref_primary_10_1016_j_jvcir_2023_103984
crossref_primary_10_1016_j_jvcir_2024_104316
crossref_primary_10_1007_s00530_024_01608_1
crossref_primary_10_1007_s00138_021_01224_3
crossref_primary_10_1007_s00530_023_01249_w
crossref_primary_10_32604_cmes_2025_059192
crossref_primary_10_1007_s00371_023_03104_5
crossref_primary_10_1007_s11042_024_18150_x
crossref_primary_10_1016_j_engappai_2023_107144
crossref_primary_10_1109_LSP_2024_3524120
crossref_primary_10_3390_rs16173330
crossref_primary_10_3390_jimaging9070130
crossref_primary_10_1016_j_engappai_2025_110546
crossref_primary_10_1016_j_jvcir_2023_104021
crossref_primary_10_1016_j_jvcir_2024_104328
crossref_primary_10_1016_j_measurement_2023_113867
crossref_primary_10_1016_j_neucom_2024_127814
crossref_primary_10_1016_j_image_2023_117018
crossref_primary_10_1007_s11042_022_13793_0
crossref_primary_10_1016_j_imavis_2024_105105
crossref_primary_10_1016_j_neucom_2025_129668
crossref_primary_10_1007_s12021_022_09579_2
crossref_primary_10_3390_app14062657
crossref_primary_10_1145_3576927
crossref_primary_10_1109_TCSVT_2023_3344569
crossref_primary_10_1016_j_eswa_2025_126943
crossref_primary_10_1109_TCSVT_2023_3287296
crossref_primary_10_1007_s11063_022_11106_y
crossref_primary_10_1109_TCSVT_2023_3315133
crossref_primary_10_1109_LSP_2021_3125828
crossref_primary_10_1111_exsy_13474
crossref_primary_10_1016_j_ins_2024_121191
crossref_primary_10_1145_3671000
crossref_primary_10_1109_TNNLS_2022_3176611
crossref_primary_10_1016_j_neucom_2022_10_004
crossref_primary_10_1109_TCSVT_2024_3445337
crossref_primary_10_1109_TIP_2023_3288986
crossref_primary_10_1016_j_neunet_2022_03_034
crossref_primary_10_1007_s00371_023_02794_1
crossref_primary_10_1016_j_measurement_2024_114239
crossref_primary_10_1186_s13640_021_00566_2
crossref_primary_10_1016_j_jvcir_2024_104346
crossref_primary_10_1016_j_jvcir_2023_103993
crossref_primary_10_1109_TMM_2024_3407695
crossref_primary_10_1016_j_engappai_2023_107489
crossref_primary_10_1016_j_jvcir_2024_104227
crossref_primary_10_1109_TCSVT_2022_3216663
crossref_primary_10_1109_TCSVT_2023_3243725
crossref_primary_10_1109_LSP_2024_3438080
crossref_primary_10_1145_3638558
crossref_primary_10_1016_j_measurement_2025_116716
crossref_primary_10_1080_13682199_2025_2470485
crossref_primary_10_1109_TCSVT_2024_3425513
crossref_primary_10_1016_j_ins_2023_119277
crossref_primary_10_1016_j_measurement_2024_114242
crossref_primary_10_32604_cmc_2025_060788
crossref_primary_10_1007_s11042_022_12776_5
crossref_primary_10_1016_j_imavis_2022_104595
Cites_doi 10.1109/TMM.2019.2954747
10.1007/978-3-030-01216-8_31
10.1109/ICCV.2017.524
10.1109/ICCV.2015.277
10.1609/aaai.v33i01.33018618
10.1109/ICCV.2017.138
10.1145/3231742
10.1007/s11263-020-01402-2
10.1109/CVPR.2015.7298935
10.1007/978–3-319-10602-148
10.1109/TPAMI.2019.2929036
10.1109/TCSVT.2019.2947482
10.1007/978-3-319-42999-1
10.1007/978-3-030-60636-7_27
10.1016/j.patcog.2019.107075
10.5555/3045118.3045336
10.1109/CVPR.2015.7298932
10.1145/3126686.3126717
10.1109/TIP.2020.2977457
10.1109/TPAMI.2019.2894139
10.1109/TMM.2017.2729019
10.1109/CVPR.2016.308
10.1007/978-3-030-20870-7_2
10.1109/TNNLS.2018.2851077
10.1109/TPAMI.2020.2975798
10.1109/ICCV.2019.00435
10.1109/CVPR.2018.00636
10.1109/TIFS.2019.2939750
10.1109/CVPR.2017.131
10.1007/s11263-016-0981-7
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2021.3067449
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 51
ExternalDocumentID 10_1109_TCSVT_2021_3067449
9381876
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61771457; 61732007; 61931008; 61672497; 61971268; 61772494; 62022083
  funderid: 10.13039/501100001809
– fundername: National Key Research and Development Program of China
  grantid: 2020YFB1406604
  funderid: 10.13039/501100012166
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c295t-ba3b5944a28d6d211a39f4c339845496d98bb5e5285a21fc8b34c664d03db94d3
IEDL.DBID RIE
ISSN 1051-8215
IngestDate Sun Jun 29 12:26:13 EDT 2025
Thu Apr 24 23:10:57 EDT 2025
Tue Jul 01 00:41:15 EDT 2025
Wed Aug 27 03:03:14 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-ba3b5944a28d6d211a39f4c339845496d98bb5e5285a21fc8b34c664d03db94d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-3225-5887
0000-0001-5739-8126
0000-0001-8437-4824
0000-0001-5755-9145
0000-0003-1204-0512
0000-0002-4820-0226
PQID 2619024178
PQPubID 85433
PageCount 9
ParticipantIDs crossref_citationtrail_10_1109_TCSVT_2021_3067449
proquest_journals_2619024178
crossref_primary_10_1109_TCSVT_2021_3067449
ieee_primary_9381876
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-Jan.
2022-1-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-Jan.
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
Herdade (ref2) 2019
ref12
ref34
ref14
ref30
ref11
ref33
ref10
ref32
ref1
ref16
ref19
ref18
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
ref29
ref7
ref9
ref4
ref3
ref6
Mao (ref15) 2014
ref5
Vaswani (ref17)
Kiros (ref8)
Ranzato (ref31) 2015
References_xml – ident: ref10
  doi: 10.1109/TMM.2019.2954747
– ident: ref33
  doi: 10.1007/978-3-030-01216-8_31
– ident: ref35
  doi: 10.1109/ICCV.2017.524
– ident: ref16
  doi: 10.1109/ICCV.2015.277
– ident: ref14
  doi: 10.1609/aaai.v33i01.33018618
– ident: ref32
  doi: 10.1109/ICCV.2017.138
– ident: ref12
  doi: 10.1145/3231742
– ident: ref24
  doi: 10.1007/s11263-020-01402-2
– volume-title: arXiv:1906.05963
  year: 2019
  ident: ref2
  article-title: Image captioning: Transforming objects into words
– ident: ref9
  doi: 10.1109/CVPR.2015.7298935
– ident: ref28
  doi: 10.1007/978–3-319-10602-148
– ident: ref21
  doi: 10.1109/TPAMI.2019.2929036
– ident: ref7
  doi: 10.1109/TCSVT.2019.2947482
– ident: ref19
  doi: 10.1007/978-3-319-42999-1
– ident: ref22
  doi: 10.1007/978-3-030-60636-7_27
– ident: ref13
  doi: 10.1016/j.patcog.2019.107075
– ident: ref5
  doi: 10.5555/3045118.3045336
– ident: ref29
  doi: 10.1109/CVPR.2015.7298932
– ident: ref34
  doi: 10.1145/3126686.3126717
– ident: ref11
  doi: 10.1109/TIP.2020.2977457
– start-page: 5998
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref17
  article-title: Attention is all you need
– ident: ref20
  doi: 10.1109/TPAMI.2019.2894139
– ident: ref25
  doi: 10.1109/TMM.2017.2729019
– ident: ref26
  doi: 10.1109/CVPR.2016.308
– ident: ref4
  doi: 10.1007/978-3-030-20870-7_2
– volume-title: arXiv:1511.06732
  year: 2015
  ident: ref31
  article-title: Sequence level training with recurrent neural networks
– ident: ref18
  doi: 10.1109/TNNLS.2018.2851077
– ident: ref3
  doi: 10.1109/TPAMI.2020.2975798
– ident: ref6
  doi: 10.1109/ICCV.2019.00435
– start-page: 595
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref8
  article-title: Multimodal neural language models
– ident: ref1
  doi: 10.1109/CVPR.2018.00636
– ident: ref23
  doi: 10.1109/TIFS.2019.2939750
– volume-title: arXiv:1410.1090
  year: 2014
  ident: ref15
  article-title: Explain images with multimodal recurrent neural networks
– ident: ref27
  doi: 10.1109/CVPR.2017.131
– ident: ref30
  doi: 10.1007/s11263-016-0981-7
SSID ssj0014847
Score 2.6933196
Snippet Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 43
SubjectTerms Adaptation models
attention mechanism
Computational modeling
Decoding
Feature extraction
Feeds
Image captioning
Modules
Regularization
Task analysis
transformer
Visualization
Title Task-Adaptive Attention for Image Captioning
URI https://ieeexplore.ieee.org/document/9381876
https://www.proquest.com/docview/2619024178
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED61nWDgVRCFgjKw0aRN7Dj2WFVUBakspKhb5FeXQlvRdOHXYzsPVYAQS5TBlqw7--6-ewLcSZWYV8MNNllg80kS5dOYUJ9aJ0NIDSITtnZ4-kwmM_w0j-cN6NW1MFprl3ymA_vrYvlqLXfWVdZnVr0kpAlNA9yKWq06YoCpGyZmzIXQp0aPVQUyA9ZPRy-vqYGCURhYAxnbvpl7SshNVfkhip1-GR_DtDpZkVayDHa5COTnt6aN_z36CRyVhqY3LG7GKTT06gwO99oPtqGX8u3SHyq-sTLPG-Z5kfvoGUPWe3w3ksYb8U3psT2H2fghHU38cnqCLyMW577gSMQMYx5RRZTBeRyxBZYIMYoNKCSKUSFiHUc05lG4kFQgLAnBaoCUYFihC2it1it9CV5io20yNvLIdgNUXIRYI0QGnEacG4p3IKzImcmytbidcPGWOYgxYJljQWZZkJUs6MB9vWdTNNb4c3Xb0rReWZKzA92Ka1n59raZxYTG8ggTevX7rms4iGwRg3OkdKGVf-z0jTEtcnHr7tQXYILG2Q
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1LT9tAEB5ROFAOBQqIlEd9gFPrEO_D3j1wiAIo4XWpqbiZfeUChKhxVMFv4a_w35hdOxEqiBtSL5YPu5blbzwz3-w8AHaMzfCvUchN-gwvWWZjwVMRCx9kSAQyMu1rh8_O0-4FO77klzPwOK2Fcc6F5DPX9LfhLN_embEPle1Jb16ytE6hPHH3f5GgjfZ7B4jmLiFHh3mnG9czBGJDJC9jrajmkjFFhE0tsh1FZZ8ZSqVgSI1SK4XW3HEiuCJJ3whNmUlTZlvUasksxed-gjn0MzipqsOmZxRMhPFl6KAksUDLOSnJacm9vPPrd47kkyRN75Iz36nzhdkLc1xeKf9g0Y4W4WnyLapEluvmuNRN8_BPm8j_9WMtwZfalY7alewvw4wbfIWFFw0WV-BnrkbXcduqodfqUbssq-zOCF31qHeLujTqqGEdk16Fiw953TWYHdwN3DpEmT9PNBw1ru93aJVOmKM0bSlBlEKEG5BM4CtM3Tzdz_C4KQKJaskiQF54yIsa8gb8mO4ZVq1D3l294jGcrqzha8DmREqKWruMCs960bdKMvHt7V3fYb6bn50Wp73zkw34THzJRggbbcJs-WfsttCRKvV2kOcIrj5aJp4BcSYjGg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Task-Adaptive+Attention+for+Image+Captioning&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Yan%2C+Chenggang&rft.au=Hao%2C+Yiming&rft.au=Li%2C+Liang&rft.au=Yin%2C+Jian&rft.date=2022-01-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=32&rft.issue=1&rft.spage=43&rft.epage=51&rft_id=info:doi/10.1109%2FTCSVT.2021.3067449&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2021_3067449
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon