Working Memory Connections for LSTM

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks....

Full description

Saved in:
Bibliographic Details
Published inNeural networks Vol. 144; pp. 334 - 341
Main Authors Landi, Federico, Baraldi, Lorenzo, Cornia, Marcella, Cucchiara, Rita
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.12.2021
Subjects
Online AccessGet full text
ISSN0893-6080
1879-2782
1879-2782
DOI10.1016/j.neunet.2021.08.030

Cover

Abstract Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
AbstractList Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
Author Cornia, Marcella
Cucchiara, Rita
Landi, Federico
Baraldi, Lorenzo
Author_xml – sequence: 1
  givenname: Federico
  orcidid: 0000-0003-2092-1934
  surname: Landi
  fullname: Landi, Federico
  email: federico.landi@unimore.it
– sequence: 2
  givenname: Lorenzo
  orcidid: 0000-0001-5125-4957
  surname: Baraldi
  fullname: Baraldi, Lorenzo
– sequence: 3
  givenname: Marcella
  orcidid: 0000-0001-9640-9385
  surname: Cornia
  fullname: Cornia, Marcella
– sequence: 4
  givenname: Rita
  orcidid: 0000-0002-2239-283X
  surname: Cucchiara
  fullname: Cucchiara, Rita
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34547671$$D View this record in MEDLINE/PubMed
BookMark eNqFkE1LAzEQQINUbKv-A5GCFy-7TjbZTdaDIMUvaPFgwWNIs7OS2iY12RX6793S6sGDXmYu7w3MG5Ke8w4JOaOQUqDF1SJ12Dps0gwymoJMgcEBGVApyiQTMuuRAciSJQVI6JNhjAsAKCRnR6TPeM5FIeiAXLz68G7d22iKKx82o7F3Dk1jvYuj2ofR5GU2PSGHtV5GPN3vYzK7v5uNH5PJ88PT-HaSGFZkTcJ5XmdIuymAlVWGlTS6kCg0ZYLWmiPTBVY8Z9VcG17VZZnlomTSSJAM2TG53J1dB__RYmzUykaDy6V26NuoOjhnAgQvOvR8j7bzFVZqHexKh436_qsD-A4wwccYsP5BKKhtPrVQu3xqm0-BVF2-Trv-pRnb6G2NJmi7_E--2cnYNfq0GFQ0Fp3ByoYuqaq8_fvAFw10i5M
CitedBy_id crossref_primary_10_1016_j_cviu_2023_103857
crossref_primary_10_1007_s40996_024_01427_4
crossref_primary_10_3390_app15052861
crossref_primary_10_1016_j_compeleceng_2025_110197
crossref_primary_10_1016_j_jfca_2024_106412
crossref_primary_10_3390_math12070945
crossref_primary_10_1007_s42452_024_06409_9
crossref_primary_10_1007_s10489_022_04052_8
crossref_primary_10_3390_s25051406
crossref_primary_10_1016_j_procs_2024_04_138
crossref_primary_10_3390_make6040124
crossref_primary_10_1016_j_mtcomm_2025_111915
crossref_primary_10_1016_j_agwat_2024_109176
crossref_primary_10_1016_j_jksuci_2023_03_007
crossref_primary_10_3389_fenrg_2021_796528
crossref_primary_10_1038_s41598_022_12355_6
crossref_primary_10_1371_journal_pone_0270327
crossref_primary_10_1016_j_jclepro_2025_145075
crossref_primary_10_1177_01423312231195365
crossref_primary_10_1016_j_trd_2024_104479
crossref_primary_10_1057_s41599_025_04412_y
crossref_primary_10_3390_electronics14051004
crossref_primary_10_3390_app14188162
crossref_primary_10_3390_polym16182607
crossref_primary_10_31590_ejosat_1080239
crossref_primary_10_1016_j_jpowsour_2025_236607
crossref_primary_10_3390_su15108373
crossref_primary_10_1016_j_egyr_2025_01_037
crossref_primary_10_1093_cercor_bhae498
crossref_primary_10_3390_s22228981
crossref_primary_10_3390_app14010393
crossref_primary_10_1016_j_ijepes_2024_110340
crossref_primary_10_3233_JIFS_232256
crossref_primary_10_3390_smartcities7060132
crossref_primary_10_1371_journal_pone_0309141
crossref_primary_10_1111_coin_70044
crossref_primary_10_29132_ijpas_1548698
crossref_primary_10_1016_j_segan_2024_101573
crossref_primary_10_1007_s10967_025_10032_2
crossref_primary_10_46387_bjesr_1480346
crossref_primary_10_3390_app11199290
crossref_primary_10_1109_ACCESS_2024_3453068
crossref_primary_10_1038_s41598_024_82192_2
crossref_primary_10_3390_agriculture15030231
crossref_primary_10_3390_math12091292
crossref_primary_10_3390_electronics10222767
crossref_primary_10_3390_pr11072158
crossref_primary_10_1186_s13020_023_00741_9
crossref_primary_10_1155_2022_7541583
crossref_primary_10_1007_s10489_023_05053_x
crossref_primary_10_3390_buildings14072223
crossref_primary_10_1155_2022_5186144
crossref_primary_10_7717_peerj_cs_2576
crossref_primary_10_1016_j_energy_2023_128973
crossref_primary_10_17798_bitlisfen_1542941
crossref_primary_10_29048_makufebed_1144631
crossref_primary_10_1016_j_neunet_2023_03_010
crossref_primary_10_1007_s41060_024_00666_y
crossref_primary_10_1016_j_cam_2025_116505
crossref_primary_10_1002_cpe_7423
crossref_primary_10_1016_j_aei_2024_102557
crossref_primary_10_1515_rams_2023_0133
crossref_primary_10_1016_j_jhazmat_2023_133099
crossref_primary_10_1109_TGRS_2024_3416293
crossref_primary_10_3390_app132111935
crossref_primary_10_1016_j_oceaneng_2024_118947
crossref_primary_10_1080_14786451_2025_2475305
crossref_primary_10_1088_1742_6596_2803_1_012002
crossref_primary_10_3390_w16091284
crossref_primary_10_1016_j_ijleo_2022_170380
crossref_primary_10_1186_s13677_023_00483_x
crossref_primary_10_3390_app14188520
crossref_primary_10_3390_electronics13163204
crossref_primary_10_1016_j_engappai_2023_107670
crossref_primary_10_3390_app13042551
crossref_primary_10_1007_s10489_024_05505_y
crossref_primary_10_3233_AIC_210172
crossref_primary_10_1109_JSEN_2024_3364748
crossref_primary_10_3389_feart_2024_1508776
crossref_primary_10_3390_app131810464
crossref_primary_10_3390_math12132119
crossref_primary_10_1007_s11227_025_06925_4
crossref_primary_10_1038_s41598_024_69418_z
crossref_primary_10_1016_j_marpolbul_2024_116698
crossref_primary_10_1016_j_aej_2024_10_057
crossref_primary_10_4236_jfrm_2024_134033
crossref_primary_10_3390_app14125021
crossref_primary_10_3390_app15020933
crossref_primary_10_3390_w16081136
crossref_primary_10_1371_journal_pone_0315799
crossref_primary_10_1007_s10489_024_06155_w
crossref_primary_10_17780_ksujes_1467269
crossref_primary_10_1007_s11831_024_10190_8
crossref_primary_10_1016_j_snb_2025_137562
crossref_primary_10_1109_TITS_2023_3279024
crossref_primary_10_1007_s11837_023_06042_8
crossref_primary_10_1016_j_jclepro_2023_139345
crossref_primary_10_1007_s11042_024_19091_1
crossref_primary_10_1109_ACCESS_2022_3217242
crossref_primary_10_1177_20552076221109530
crossref_primary_10_1016_j_energy_2023_128146
crossref_primary_10_1016_j_phycom_2022_101785
crossref_primary_10_3390_min14090894
crossref_primary_10_1016_j_egyr_2022_07_007
crossref_primary_10_1007_s11269_023_03430_2
crossref_primary_10_1016_j_neunet_2024_106569
crossref_primary_10_3390_batteries10030089
crossref_primary_10_1155_2022_8407437
crossref_primary_10_1007_s11334_024_00559_0
crossref_primary_10_1007_s11042_022_12772_9
Cites_doi 10.1109/IJCNN.2000.861302
10.1109/5.726791
10.1109/CVPR.2018.00636
10.1016/j.cviu.2017.10.011
10.1162/neco_a_01174
10.3115/1073083.1073135
10.1162/089976600300015015
10.1038/323533a0
10.1109/ICASSP.2013.6638947
10.1109/CVPR.2015.7298932
10.1016/j.patrec.2019.11.003
10.1007/978-3-319-10602-1_48
10.1109/CVPR.2015.7299087
10.1162/neco.1997.9.8.1735
10.3115/v1/D14-1179
10.1007/978-3-319-46454-1_24
10.1016/j.patrec.2020.05.033
10.1109/TNNLS.2016.2582924
10.1037/0033-295X.102.2.211
10.1207/s15516709cog1402_1
10.1109/CVPR.2015.7298935
10.1109/CVPR.2016.90
10.1109/72.279181
10.1109/ICNN.1993.298725
10.1109/CVPR.2017.339
ContentType Journal Article
Copyright 2021 Elsevier Ltd
Copyright © 2021 Elsevier Ltd. All rights reserved.
Copyright_xml – notice: 2021 Elsevier Ltd
– notice: Copyright © 2021 Elsevier Ltd. All rights reserved.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1016/j.neunet.2021.08.030
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1879-2782
EndPage 341
ExternalDocumentID 34547671
10_1016_j_neunet_2021_08_030
S0893608021003439
Genre Journal Article
GroupedDBID ---
--K
--M
-~X
.DC
.~1
0R~
123
186
1B1
1RT
1~.
1~5
29N
4.4
457
4G.
53G
5RE
5VS
6TJ
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXLA
AAXUO
AAYFN
ABAOU
ABBOA
ABCQJ
ABEFU
ABFNM
ABFRF
ABHFT
ABIVO
ABJNI
ABLJU
ABMAC
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFO
ACGFS
ACIUM
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADJOM
ADMUD
ADRHT
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HMQ
HVGLF
HZ~
IHE
J1W
JJJVA
K-O
KOM
KZ1
LG9
LMP
M2V
M41
MHUIS
MO0
MOBAO
MVM
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SNS
SPC
SPCBC
SSN
SST
SSV
SSW
SSZ
T5K
TAE
UAP
UNMZH
VOH
WUQ
XPP
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
BNPGV
CITATION
SSH
CGR
CUY
CVF
ECM
EFKBS
EIF
NPM
7X8
ID FETCH-LOGICAL-c362t-445f2e145f7039d2ed8ca68e7a1371fa4e3a6ed453dbac4df99257938c8083e3
IEDL.DBID AIKHN
ISSN 0893-6080
1879-2782
IngestDate Fri Sep 05 14:31:30 EDT 2025
Mon Jul 21 05:46:01 EDT 2025
Tue Jul 01 01:24:39 EDT 2025
Thu Apr 24 23:12:52 EDT 2025
Fri Feb 23 02:41:14 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Gated RNNs
Long Short-Term Memory networks
Cell-to-gate connections
Image captioning
Language modeling
Language English
License Copyright © 2021 Elsevier Ltd. All rights reserved.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c362t-445f2e145f7039d2ed8ca68e7a1371fa4e3a6ed453dbac4df99257938c8083e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-5125-4957
0000-0003-2092-1934
0000-0002-2239-283X
0000-0001-9640-9385
PMID 34547671
PQID 2575370746
PQPubID 23479
PageCount 8
ParticipantIDs proquest_miscellaneous_2575370746
pubmed_primary_34547671
crossref_primary_10_1016_j_neunet_2021_08_030
crossref_citationtrail_10_1016_j_neunet_2021_08_030
elsevier_sciencedirect_doi_10_1016_j_neunet_2021_08_030
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate December 2021
2021-12-00
2021-Dec
20211201
PublicationDateYYYYMMDD 2021-12-01
PublicationDate_xml – month: 12
  year: 2021
  text: December 2021
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Neural networks
PublicationTitleAlternate Neural Netw
PublicationYear 2021
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Le, Jaitly, Hinton (b28) 2015
Merity, Keskar, Socher (b35) 2018
Gers, F. A., & Schmidhuber, J. (2000). Recurrent nets that time and count. In
Jing, Gulcehre, Peurifoy, Shen, Tegmark, Soljacic, Bengio (b25) 2019; 31
Baraldi, L., Grana, C., & Cucchiara, R. (2017). Hierarchical boundary-aware neural encoder for video captioning. In
Cho, Van Merriënboer, Bahdanau, Bengio (b10) 2014
Liu, Hao, Zhang, Zhang (b33) 2020; 136
Ren, He, Girshick, Sun (b38) 2015
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b41) 2017
Gers, Schmidhuber, Cummins (b17) 2000; 12
Marcus, Marcinkiewicz (b34) 1993; 19
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In
Chung, Gulcehre, Cho, Bengio (b12) 2014
Ericsson, Kintsch (b15) 1995; 102
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In
Rumelhart, Hinton, Williams (b39) 1986; 323
Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In
Elman (b14) 1990; 14
Hernandez (b22) 2018
Arpit, D., Kanuparthi, B., Kerg, G., Ke, N. R., Mitliagkas, I., & Bengio, Y. (2019). h-detach: Modifying the LSTM Gradient Towards Better Optimization. In
Sutskever, Vinyals, Le (b40) 2014
Xiao, Xu, Shi (b44) 2020; 129
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
.
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In
Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell:A Neural Image Caption Generator. In
Bengio, Simard, Frasconi (b9) 1994; 5
Greff, Srivastava, Koutník, Steunebrink, Schmidhuber (b20) 2017; 28
Graves (b18) 2013
Hochreiter, Schmidhuber (b24) 1997; 9
Banerjee, S., & Lavie, A. (2005)METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In
LeCun, Bottou, Bengio, Haffner (b29) 1998; 86
Li, Gavrilyuk, Gavves, Jain, Snoek (b30) 2018; 166
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. In
Hochreiter (b23) 1991
Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). CIDEr:Consensus-based image description evaluation. In
Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic propositional image caption evaluation. In
Kingma, D., & Ba, J. (2015). Adam: a method for stochastic optimization. In
Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In
Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, Attend and Tell: Neural image caption generation with visual attention. In
Elman (10.1016/j.neunet.2021.08.030_b14) 1990; 14
Hochreiter (10.1016/j.neunet.2021.08.030_b23) 1991
Cho (10.1016/j.neunet.2021.08.030_b10) 2014
10.1016/j.neunet.2021.08.030_b19
10.1016/j.neunet.2021.08.030_b16
Greff (10.1016/j.neunet.2021.08.030_b20) 2017; 28
Li (10.1016/j.neunet.2021.08.030_b30) 2018; 166
10.1016/j.neunet.2021.08.030_b37
10.1016/j.neunet.2021.08.030_b11
10.1016/j.neunet.2021.08.030_b36
10.1016/j.neunet.2021.08.030_b13
LeCun (10.1016/j.neunet.2021.08.030_b29) 1998; 86
10.1016/j.neunet.2021.08.030_b32
Hernandez (10.1016/j.neunet.2021.08.030_b22) 2018
10.1016/j.neunet.2021.08.030_b31
Hochreiter (10.1016/j.neunet.2021.08.030_b24) 1997; 9
Bengio (10.1016/j.neunet.2021.08.030_b9) 1994; 5
Ren (10.1016/j.neunet.2021.08.030_b38) 2015
Vaswani (10.1016/j.neunet.2021.08.030_b41) 2017
Gers (10.1016/j.neunet.2021.08.030_b17) 2000; 12
Sutskever (10.1016/j.neunet.2021.08.030_b40) 2014
Jing (10.1016/j.neunet.2021.08.030_b25) 2019; 31
10.1016/j.neunet.2021.08.030_b27
Liu (10.1016/j.neunet.2021.08.030_b33) 2020; 136
10.1016/j.neunet.2021.08.030_b26
Le (10.1016/j.neunet.2021.08.030_b28) 2015
10.1016/j.neunet.2021.08.030_b45
10.1016/j.neunet.2021.08.030_b8
10.1016/j.neunet.2021.08.030_b6
10.1016/j.neunet.2021.08.030_b7
Rumelhart (10.1016/j.neunet.2021.08.030_b39) 1986; 323
10.1016/j.neunet.2021.08.030_b4
10.1016/j.neunet.2021.08.030_b21
10.1016/j.neunet.2021.08.030_b43
10.1016/j.neunet.2021.08.030_b5
10.1016/j.neunet.2021.08.030_b42
10.1016/j.neunet.2021.08.030_b2
Merity (10.1016/j.neunet.2021.08.030_b35) 2018
10.1016/j.neunet.2021.08.030_b3
Xiao (10.1016/j.neunet.2021.08.030_b44) 2020; 129
Chung (10.1016/j.neunet.2021.08.030_b12) 2014
10.1016/j.neunet.2021.08.030_b1
Graves (10.1016/j.neunet.2021.08.030_b18) 2013
Ericsson (10.1016/j.neunet.2021.08.030_b15) 1995; 102
Marcus (10.1016/j.neunet.2021.08.030_b34) 1993; 19
References_xml – reference: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
– volume: 19
  start-page: 313
  year: 1993
  end-page: 330
  ident: b34
  article-title: Building a large annotated corpus of english: The penn treebank
  publication-title: Computational Linguistics
– reference: Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In
– reference: Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In
– reference: Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell:A Neural Image Caption Generator. In
– reference: Baraldi, L., Grana, C., & Cucchiara, R. (2017). Hierarchical boundary-aware neural encoder for video captioning. In
– volume: 5
  start-page: 157
  year: 1994
  end-page: 166
  ident: b9
  article-title: Learning long-term dependencies with gradient descent is difficult
  publication-title: IEEE Transactions on Neural Networks
– volume: 166
  start-page: 41
  year: 2018
  end-page: 50
  ident: b30
  article-title: VideoLSTM convolves, attends and flows for action recognition
  publication-title: Computer Vision and Image Understanding
– reference: Gers, F. A., & Schmidhuber, J. (2000). Recurrent nets that time and count. In
– volume: 9
  start-page: 1735
  year: 1997
  end-page: 1780
  ident: b24
  article-title: Long short-term memory
  publication-title: Neural Computation
– reference: Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In
– reference: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In
– volume: 102
  start-page: 211
  year: 1995
  ident: b15
  article-title: Long-term working memory
  publication-title: Psychological Review
– reference: Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In
– reference: Banerjee, S., & Lavie, A. (2005)METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In
– reference: Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In
– reference: Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). CIDEr:Consensus-based image description evaluation. In
– reference: Arpit, D., Kanuparthi, B., Kerg, G., Ke, N. R., Mitliagkas, I., & Bengio, Y. (2019). h-detach: Modifying the LSTM Gradient Towards Better Optimization. In
– reference: Kingma, D., & Ba, J. (2015). Adam: a method for stochastic optimization. In
– year: 1991
  ident: b23
  article-title: Untersuchungen zu dynamischen neuronalen Netzen
  publication-title: Diploma
– year: 2017
  ident: b41
  article-title: Attention is all you need
  publication-title: Advances in Neural Information Processing Systems
– year: 2014
  ident: b40
  article-title: Sequence to sequence learning with neural networks
  publication-title: Advances in Neural Information Processing Systems
– reference: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. In
– reference: Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In
– volume: 14
  start-page: 179
  year: 1990
  end-page: 211
  ident: b14
  article-title: Finding structure in time
  publication-title: Cognitive Science
– volume: 86
  start-page: 2278
  year: 1998
  end-page: 2324
  ident: b29
  article-title: Gradient-based learning applied to document recognition
  publication-title: Proceedings of the IEEE
– reference: Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In
– volume: 28
  start-page: 2222
  year: 2017
  end-page: 2232
  ident: b20
  article-title: LSTM: A search space Odyssey
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– year: 2018
  ident: b22
  article-title: Neuroethics, Nootropics, Neuroenhancement: The Ethical Case Against Pharmacological Enhancements
– year: 2015
  ident: b38
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Advances in Neural Information Processing Systems
– volume: 31
  start-page: 765
  year: 2019
  end-page: 783
  ident: b25
  article-title: Gated orthogonal recurrent units: On learning to forget
  publication-title: Neural Computation
– volume: 136
  start-page: 81
  year: 2020
  end-page: 86
  ident: b33
  article-title: Simplified long short-term memory model for robust and fast prediction
  publication-title: Pattern Recognition Letters
– reference: Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic propositional image caption evaluation. In
– reference: Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In
– year: 2018
  ident: b35
  article-title: An analysis of neural language modeling at multiple scales
– year: 2014
  ident: b10
  article-title: On the properties of neural machine translation: Encoder-decoder approaches
– reference: .
– reference: Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In
– volume: 129
  start-page: 173
  year: 2020
  end-page: 180
  ident: b44
  article-title: Exploring diverse and fine-grained caption for video by incorporating convolutional architecture into LSTM-based model
  publication-title: Pattern Recognition Letters
– reference: Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, Attend and Tell: Neural image caption generation with visual attention. In
– volume: 323
  start-page: 533
  year: 1986
  end-page: 536
  ident: b39
  article-title: Learning representations by back-propagating errors
  publication-title: Nature
– volume: 12
  start-page: 2451
  year: 2000
  end-page: 2471
  ident: b17
  article-title: Learning to forget: Continual prediction with LSTM
  publication-title: Neural Computation
– year: 2014
  ident: b12
  article-title: Empirical evaluation of gated recurrent neural networks on sequence modeling
– year: 2015
  ident: b28
  article-title: A simple way to initialize recurrent networks of rectified linear units
– reference: Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In
– year: 2013
  ident: b18
  article-title: Generating sequences with recurrent neural networks
– ident: 10.1016/j.neunet.2021.08.030_b16
  doi: 10.1109/IJCNN.2000.861302
– ident: 10.1016/j.neunet.2021.08.030_b45
– year: 2013
  ident: 10.1016/j.neunet.2021.08.030_b18
– volume: 86
  start-page: 2278
  issue: 11
  year: 1998
  ident: 10.1016/j.neunet.2021.08.030_b29
  article-title: Gradient-based learning applied to document recognition
  publication-title: Proceedings of the IEEE
  doi: 10.1109/5.726791
– year: 2017
  ident: 10.1016/j.neunet.2021.08.030_b41
  article-title: Attention is all you need
– ident: 10.1016/j.neunet.2021.08.030_b2
  doi: 10.1109/CVPR.2018.00636
– ident: 10.1016/j.neunet.2021.08.030_b5
– volume: 166
  start-page: 41
  year: 2018
  ident: 10.1016/j.neunet.2021.08.030_b30
  article-title: VideoLSTM convolves, attends and flows for action recognition
  publication-title: Computer Vision and Image Understanding
  doi: 10.1016/j.cviu.2017.10.011
– ident: 10.1016/j.neunet.2021.08.030_b37
– year: 2014
  ident: 10.1016/j.neunet.2021.08.030_b40
  article-title: Sequence to sequence learning with neural networks
– ident: 10.1016/j.neunet.2021.08.030_b3
– volume: 31
  start-page: 765
  issue: 4
  year: 2019
  ident: 10.1016/j.neunet.2021.08.030_b25
  article-title: Gated orthogonal recurrent units: On learning to forget
  publication-title: Neural Computation
  doi: 10.1162/neco_a_01174
– ident: 10.1016/j.neunet.2021.08.030_b36
  doi: 10.3115/1073083.1073135
– year: 2015
  ident: 10.1016/j.neunet.2021.08.030_b38
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
– volume: 12
  start-page: 2451
  issue: 10
  year: 2000
  ident: 10.1016/j.neunet.2021.08.030_b17
  article-title: Learning to forget: Continual prediction with LSTM
  publication-title: Neural Computation
  doi: 10.1162/089976600300015015
– year: 2018
  ident: 10.1016/j.neunet.2021.08.030_b35
– year: 2014
  ident: 10.1016/j.neunet.2021.08.030_b12
– volume: 323
  start-page: 533
  year: 1986
  ident: 10.1016/j.neunet.2021.08.030_b39
  article-title: Learning representations by back-propagating errors
  publication-title: Nature
  doi: 10.1038/323533a0
– ident: 10.1016/j.neunet.2021.08.030_b19
  doi: 10.1109/ICASSP.2013.6638947
– ident: 10.1016/j.neunet.2021.08.030_b27
– ident: 10.1016/j.neunet.2021.08.030_b26
  doi: 10.1109/CVPR.2015.7298932
– volume: 129
  start-page: 173
  year: 2020
  ident: 10.1016/j.neunet.2021.08.030_b44
  article-title: Exploring diverse and fine-grained caption for video by incorporating convolutional architecture into LSTM-based model
  publication-title: Pattern Recognition Letters
  doi: 10.1016/j.patrec.2019.11.003
– ident: 10.1016/j.neunet.2021.08.030_b32
  doi: 10.1007/978-3-319-10602-1_48
– ident: 10.1016/j.neunet.2021.08.030_b42
  doi: 10.1109/CVPR.2015.7299087
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 10.1016/j.neunet.2021.08.030_b24
  article-title: Long short-term memory
  publication-title: Neural Computation
  doi: 10.1162/neco.1997.9.8.1735
– ident: 10.1016/j.neunet.2021.08.030_b11
  doi: 10.3115/v1/D14-1179
– year: 1991
  ident: 10.1016/j.neunet.2021.08.030_b23
  article-title: Untersuchungen zu dynamischen neuronalen Netzen
– ident: 10.1016/j.neunet.2021.08.030_b4
– ident: 10.1016/j.neunet.2021.08.030_b13
– year: 2018
  ident: 10.1016/j.neunet.2021.08.030_b22
– ident: 10.1016/j.neunet.2021.08.030_b1
  doi: 10.1007/978-3-319-46454-1_24
– ident: 10.1016/j.neunet.2021.08.030_b6
– volume: 136
  start-page: 81
  year: 2020
  ident: 10.1016/j.neunet.2021.08.030_b33
  article-title: Simplified long short-term memory model for robust and fast prediction
  publication-title: Pattern Recognition Letters
  doi: 10.1016/j.patrec.2020.05.033
– volume: 28
  start-page: 2222
  year: 2017
  ident: 10.1016/j.neunet.2021.08.030_b20
  article-title: LSTM: A search space Odyssey
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2016.2582924
– ident: 10.1016/j.neunet.2021.08.030_b31
– volume: 102
  start-page: 211
  issue: 2
  year: 1995
  ident: 10.1016/j.neunet.2021.08.030_b15
  article-title: Long-term working memory
  publication-title: Psychological Review
  doi: 10.1037/0033-295X.102.2.211
– volume: 19
  start-page: 313
  issue: 2
  year: 1993
  ident: 10.1016/j.neunet.2021.08.030_b34
  article-title: Building a large annotated corpus of english: The penn treebank
  publication-title: Computational Linguistics
– volume: 14
  start-page: 179
  year: 1990
  ident: 10.1016/j.neunet.2021.08.030_b14
  article-title: Finding structure in time
  publication-title: Cognitive Science
  doi: 10.1207/s15516709cog1402_1
– ident: 10.1016/j.neunet.2021.08.030_b43
  doi: 10.1109/CVPR.2015.7298935
– year: 2015
  ident: 10.1016/j.neunet.2021.08.030_b28
– ident: 10.1016/j.neunet.2021.08.030_b21
  doi: 10.1109/CVPR.2016.90
– volume: 5
  start-page: 157
  issue: 2
  year: 1994
  ident: 10.1016/j.neunet.2021.08.030_b9
  article-title: Learning long-term dependencies with gradient descent is difficult
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/72.279181
– year: 2014
  ident: 10.1016/j.neunet.2021.08.030_b10
– ident: 10.1016/j.neunet.2021.08.030_b8
  doi: 10.1109/ICNN.1993.298725
– ident: 10.1016/j.neunet.2021.08.030_b7
  doi: 10.1109/CVPR.2017.339
SSID ssj0006843
Score 2.671968
Snippet Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning...
SourceID proquest
pubmed
crossref
elsevier
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 334
SubjectTerms Cell-to-gate connections
Gated RNNs
Image captioning
Knowledge
Language modeling
Learning
Long Short-Term Memory networks
Neural Networks, Computer
Title Working Memory Connections for LSTM
URI https://dx.doi.org/10.1016/j.neunet.2021.08.030
https://www.ncbi.nlm.nih.gov/pubmed/34547671
https://www.proquest.com/docview/2575370746
Volume 144
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qe_Hi-1EfJaLX2Gw2ye4eS7HUR3uxgrclj1lQJIptD1787e4km4KgCF4CCbtkM5OZbzb5ZgbgQklDMMd9rjJjNyh54CseGJ8JFhYMlYVoykaeTJPxQ3TzGD-2YNjkwhCt0vn-2qdX3tpd6Ttp9t-envr3gYXahFJFGRVZ4WoNOiFXSdyGzuD6djxdOeRE1uQ5O96nCU0GXUXzKnFZIpEqw7qWJ9Ghf0ao3yLQColGW7DhQkhvUK9yG1pY7sBm057Bc9a6C-fuQ7g3ITbth1dxWqo0hrlnQ1Xv7n422YPZ6Go2HPuuJ4KfW6hZ-FEUmxCZPVpTVUWIhczTRKJIGRfMpBHyNMEiinmRpXlUGKWsUSouc2mDLeT70C5fSzwEL4tQWsFQCT8RYYxZKk0WJEZkEkMjZBd4Iwadu3rh1LbiRTfEsGddC0-T8DR1s-RBF_zVrLe6XsYf40UjYf1N79q69D9mnjUK0dYk6D9HWuLrcq7tA8dcUCOVLhzUmlqthVMBs0Swo3_f9xjW6aymtJxAe_G-xFMbmCyyHqxdfrKee_2-AAfL32U
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB2xHODCvpQ1CK6hSZzG9hEhUIG2lxapNyvLWCpCaUXbAxe-nZksICRQJS45JGPFefYsSd7MAFxpZdnNCVfoxNILSuq5WnjW9aUfZD5qctGcjdztRe3n8HHYGi7BbZ0Lw7TKyvaXNr2w1tWZZoVmczIaNfseudqIU0V9LrIi9DKshi0hmdd3_fHN84hUSZ0jaZfF6_y5guSV4zxHplQGZSVPJkP_7p_-ij8LP3S_BRtVAOnclHPchiXMd2Czbs7gVLq6C5fVZ3Cny1zad6dgtBRJDFOHAlWn0x9092Bwfze4bbtVRwQ3JUczc8OwZQP06UiKqrMAM5XGkUIZ-0L6Ng5RxBFmBEOWxGmYWa1JJbVQqaJQC8U-rOTjHA_BSUJUBAwX8JMhtjCJlU28yMpEYWClaoCoYTBpVS2cm1a8mpoW9mJK8AyDZ7iXpfAa4H6NmpTVMhbIyxph82PVDRn0BSMv6gUxpBD8lyPOcTyfGnpg2gXcRqUBB-VKfc1FcPmySPpH_77vOay1B92O6Tz0no5hna-U5JYTWJm9zfGUQpRZclZswU_laOAw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Working+Memory+Connections+for+LSTM&rft.jtitle=Neural+networks&rft.au=Landi%2C+Federico&rft.au=Baraldi%2C+Lorenzo&rft.au=Cornia%2C+Marcella&rft.au=Cucchiara%2C+Rita&rft.date=2021-12-01&rft.issn=1879-2782&rft.eissn=1879-2782&rft.volume=144&rft.spage=334&rft_id=info:doi/10.1016%2Fj.neunet.2021.08.030&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon