Working Memory Connections for LSTM
Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks....
Saved in:
Published in | Neural networks Vol. 144; pp. 334 - 341 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Ltd
01.12.2021
|
Subjects | |
Online Access | Get full text |
ISSN | 0893-6080 1879-2782 1879-2782 |
DOI | 10.1016/j.neunet.2021.08.030 |
Cover
Abstract | Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure. |
---|---|
AbstractList | Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure. Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure. |
Author | Cornia, Marcella Cucchiara, Rita Landi, Federico Baraldi, Lorenzo |
Author_xml | – sequence: 1 givenname: Federico orcidid: 0000-0003-2092-1934 surname: Landi fullname: Landi, Federico email: federico.landi@unimore.it – sequence: 2 givenname: Lorenzo orcidid: 0000-0001-5125-4957 surname: Baraldi fullname: Baraldi, Lorenzo – sequence: 3 givenname: Marcella orcidid: 0000-0001-9640-9385 surname: Cornia fullname: Cornia, Marcella – sequence: 4 givenname: Rita orcidid: 0000-0002-2239-283X surname: Cucchiara fullname: Cucchiara, Rita |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34547671$$D View this record in MEDLINE/PubMed |
BookMark | eNqFkE1LAzEQQINUbKv-A5GCFy-7TjbZTdaDIMUvaPFgwWNIs7OS2iY12RX6793S6sGDXmYu7w3MG5Ke8w4JOaOQUqDF1SJ12Dps0gwymoJMgcEBGVApyiQTMuuRAciSJQVI6JNhjAsAKCRnR6TPeM5FIeiAXLz68G7d22iKKx82o7F3Dk1jvYuj2ofR5GU2PSGHtV5GPN3vYzK7v5uNH5PJ88PT-HaSGFZkTcJ5XmdIuymAlVWGlTS6kCg0ZYLWmiPTBVY8Z9VcG17VZZnlomTSSJAM2TG53J1dB__RYmzUykaDy6V26NuoOjhnAgQvOvR8j7bzFVZqHexKh436_qsD-A4wwccYsP5BKKhtPrVQu3xqm0-BVF2-Trv-pRnb6G2NJmi7_E--2cnYNfq0GFQ0Fp3ByoYuqaq8_fvAFw10i5M |
CitedBy_id | crossref_primary_10_1016_j_cviu_2023_103857 crossref_primary_10_1007_s40996_024_01427_4 crossref_primary_10_3390_app15052861 crossref_primary_10_1016_j_compeleceng_2025_110197 crossref_primary_10_1016_j_jfca_2024_106412 crossref_primary_10_3390_math12070945 crossref_primary_10_1007_s42452_024_06409_9 crossref_primary_10_1007_s10489_022_04052_8 crossref_primary_10_3390_s25051406 crossref_primary_10_1016_j_procs_2024_04_138 crossref_primary_10_3390_make6040124 crossref_primary_10_1016_j_mtcomm_2025_111915 crossref_primary_10_1016_j_agwat_2024_109176 crossref_primary_10_1016_j_jksuci_2023_03_007 crossref_primary_10_3389_fenrg_2021_796528 crossref_primary_10_1038_s41598_022_12355_6 crossref_primary_10_1371_journal_pone_0270327 crossref_primary_10_1016_j_jclepro_2025_145075 crossref_primary_10_1177_01423312231195365 crossref_primary_10_1016_j_trd_2024_104479 crossref_primary_10_1057_s41599_025_04412_y crossref_primary_10_3390_electronics14051004 crossref_primary_10_3390_app14188162 crossref_primary_10_3390_polym16182607 crossref_primary_10_31590_ejosat_1080239 crossref_primary_10_1016_j_jpowsour_2025_236607 crossref_primary_10_3390_su15108373 crossref_primary_10_1016_j_egyr_2025_01_037 crossref_primary_10_1093_cercor_bhae498 crossref_primary_10_3390_s22228981 crossref_primary_10_3390_app14010393 crossref_primary_10_1016_j_ijepes_2024_110340 crossref_primary_10_3233_JIFS_232256 crossref_primary_10_3390_smartcities7060132 crossref_primary_10_1371_journal_pone_0309141 crossref_primary_10_1111_coin_70044 crossref_primary_10_29132_ijpas_1548698 crossref_primary_10_1016_j_segan_2024_101573 crossref_primary_10_1007_s10967_025_10032_2 crossref_primary_10_46387_bjesr_1480346 crossref_primary_10_3390_app11199290 crossref_primary_10_1109_ACCESS_2024_3453068 crossref_primary_10_1038_s41598_024_82192_2 crossref_primary_10_3390_agriculture15030231 crossref_primary_10_3390_math12091292 crossref_primary_10_3390_electronics10222767 crossref_primary_10_3390_pr11072158 crossref_primary_10_1186_s13020_023_00741_9 crossref_primary_10_1155_2022_7541583 crossref_primary_10_1007_s10489_023_05053_x crossref_primary_10_3390_buildings14072223 crossref_primary_10_1155_2022_5186144 crossref_primary_10_7717_peerj_cs_2576 crossref_primary_10_1016_j_energy_2023_128973 crossref_primary_10_17798_bitlisfen_1542941 crossref_primary_10_29048_makufebed_1144631 crossref_primary_10_1016_j_neunet_2023_03_010 crossref_primary_10_1007_s41060_024_00666_y crossref_primary_10_1016_j_cam_2025_116505 crossref_primary_10_1002_cpe_7423 crossref_primary_10_1016_j_aei_2024_102557 crossref_primary_10_1515_rams_2023_0133 crossref_primary_10_1016_j_jhazmat_2023_133099 crossref_primary_10_1109_TGRS_2024_3416293 crossref_primary_10_3390_app132111935 crossref_primary_10_1016_j_oceaneng_2024_118947 crossref_primary_10_1080_14786451_2025_2475305 crossref_primary_10_1088_1742_6596_2803_1_012002 crossref_primary_10_3390_w16091284 crossref_primary_10_1016_j_ijleo_2022_170380 crossref_primary_10_1186_s13677_023_00483_x crossref_primary_10_3390_app14188520 crossref_primary_10_3390_electronics13163204 crossref_primary_10_1016_j_engappai_2023_107670 crossref_primary_10_3390_app13042551 crossref_primary_10_1007_s10489_024_05505_y crossref_primary_10_3233_AIC_210172 crossref_primary_10_1109_JSEN_2024_3364748 crossref_primary_10_3389_feart_2024_1508776 crossref_primary_10_3390_app131810464 crossref_primary_10_3390_math12132119 crossref_primary_10_1007_s11227_025_06925_4 crossref_primary_10_1038_s41598_024_69418_z crossref_primary_10_1016_j_marpolbul_2024_116698 crossref_primary_10_1016_j_aej_2024_10_057 crossref_primary_10_4236_jfrm_2024_134033 crossref_primary_10_3390_app14125021 crossref_primary_10_3390_app15020933 crossref_primary_10_3390_w16081136 crossref_primary_10_1371_journal_pone_0315799 crossref_primary_10_1007_s10489_024_06155_w crossref_primary_10_17780_ksujes_1467269 crossref_primary_10_1007_s11831_024_10190_8 crossref_primary_10_1016_j_snb_2025_137562 crossref_primary_10_1109_TITS_2023_3279024 crossref_primary_10_1007_s11837_023_06042_8 crossref_primary_10_1016_j_jclepro_2023_139345 crossref_primary_10_1007_s11042_024_19091_1 crossref_primary_10_1109_ACCESS_2022_3217242 crossref_primary_10_1177_20552076221109530 crossref_primary_10_1016_j_energy_2023_128146 crossref_primary_10_1016_j_phycom_2022_101785 crossref_primary_10_3390_min14090894 crossref_primary_10_1016_j_egyr_2022_07_007 crossref_primary_10_1007_s11269_023_03430_2 crossref_primary_10_1016_j_neunet_2024_106569 crossref_primary_10_3390_batteries10030089 crossref_primary_10_1155_2022_8407437 crossref_primary_10_1007_s11334_024_00559_0 crossref_primary_10_1007_s11042_022_12772_9 |
Cites_doi | 10.1109/IJCNN.2000.861302 10.1109/5.726791 10.1109/CVPR.2018.00636 10.1016/j.cviu.2017.10.011 10.1162/neco_a_01174 10.3115/1073083.1073135 10.1162/089976600300015015 10.1038/323533a0 10.1109/ICASSP.2013.6638947 10.1109/CVPR.2015.7298932 10.1016/j.patrec.2019.11.003 10.1007/978-3-319-10602-1_48 10.1109/CVPR.2015.7299087 10.1162/neco.1997.9.8.1735 10.3115/v1/D14-1179 10.1007/978-3-319-46454-1_24 10.1016/j.patrec.2020.05.033 10.1109/TNNLS.2016.2582924 10.1037/0033-295X.102.2.211 10.1207/s15516709cog1402_1 10.1109/CVPR.2015.7298935 10.1109/CVPR.2016.90 10.1109/72.279181 10.1109/ICNN.1993.298725 10.1109/CVPR.2017.339 |
ContentType | Journal Article |
Copyright | 2021 Elsevier Ltd Copyright © 2021 Elsevier Ltd. All rights reserved. |
Copyright_xml | – notice: 2021 Elsevier Ltd – notice: Copyright © 2021 Elsevier Ltd. All rights reserved. |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
DOI | 10.1016/j.neunet.2021.08.030 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1879-2782 |
EndPage | 341 |
ExternalDocumentID | 34547671 10_1016_j_neunet_2021_08_030 S0893608021003439 |
Genre | Journal Article |
GroupedDBID | --- --K --M -~X .DC .~1 0R~ 123 186 1B1 1RT 1~. 1~5 29N 4.4 457 4G. 53G 5RE 5VS 6TJ 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXLA AAXUO AAYFN ABAOU ABBOA ABCQJ ABEFU ABFNM ABFRF ABHFT ABIVO ABJNI ABLJU ABMAC ABXDB ABYKQ ACAZW ACDAQ ACGFO ACGFS ACIUM ACNNM ACRLP ACZNC ADBBV ADEZE ADGUI ADJOM ADMUD ADRHT AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HMQ HVGLF HZ~ IHE J1W JJJVA K-O KOM KZ1 LG9 LMP M2V M41 MHUIS MO0 MOBAO MVM N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SNS SPC SPCBC SSN SST SSV SSW SSZ T5K TAE UAP UNMZH VOH WUQ XPP ZMT ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH CGR CUY CVF ECM EFKBS EIF NPM 7X8 |
ID | FETCH-LOGICAL-c362t-445f2e145f7039d2ed8ca68e7a1371fa4e3a6ed453dbac4df99257938c8083e3 |
IEDL.DBID | AIKHN |
ISSN | 0893-6080 1879-2782 |
IngestDate | Fri Sep 05 14:31:30 EDT 2025 Mon Jul 21 05:46:01 EDT 2025 Tue Jul 01 01:24:39 EDT 2025 Thu Apr 24 23:12:52 EDT 2025 Fri Feb 23 02:41:14 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Gated RNNs Long Short-Term Memory networks Cell-to-gate connections Image captioning Language modeling |
Language | English |
License | Copyright © 2021 Elsevier Ltd. All rights reserved. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c362t-445f2e145f7039d2ed8ca68e7a1371fa4e3a6ed453dbac4df99257938c8083e3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0001-5125-4957 0000-0003-2092-1934 0000-0002-2239-283X 0000-0001-9640-9385 |
PMID | 34547671 |
PQID | 2575370746 |
PQPubID | 23479 |
PageCount | 8 |
ParticipantIDs | proquest_miscellaneous_2575370746 pubmed_primary_34547671 crossref_primary_10_1016_j_neunet_2021_08_030 crossref_citationtrail_10_1016_j_neunet_2021_08_030 elsevier_sciencedirect_doi_10_1016_j_neunet_2021_08_030 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | December 2021 2021-12-00 2021-Dec 20211201 |
PublicationDateYYYYMMDD | 2021-12-01 |
PublicationDate_xml | – month: 12 year: 2021 text: December 2021 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Neural networks |
PublicationTitleAlternate | Neural Netw |
PublicationYear | 2021 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | Le, Jaitly, Hinton (b28) 2015 Merity, Keskar, Socher (b35) 2018 Gers, F. A., & Schmidhuber, J. (2000). Recurrent nets that time and count. In Jing, Gulcehre, Peurifoy, Shen, Tegmark, Soljacic, Bengio (b25) 2019; 31 Baraldi, L., Grana, C., & Cucchiara, R. (2017). Hierarchical boundary-aware neural encoder for video captioning. In Cho, Van Merriënboer, Bahdanau, Bengio (b10) 2014 Liu, Hao, Zhang, Zhang (b33) 2020; 136 Ren, He, Girshick, Sun (b38) 2015 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b41) 2017 Gers, Schmidhuber, Cummins (b17) 2000; 12 Marcus, Marcinkiewicz (b34) 1993; 19 Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Chung, Gulcehre, Cho, Bengio (b12) 2014 Ericsson, Kintsch (b15) 1995; 102 Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Rumelhart, Hinton, Williams (b39) 1986; 323 Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Elman (b14) 1990; 14 Hernandez (b22) 2018 Arpit, D., Kanuparthi, B., Kerg, G., Ke, N. R., Mitliagkas, I., & Bengio, Y. (2019). h-detach: Modifying the LSTM Gradient Towards Better Optimization. In Sutskever, Vinyals, Le (b40) 2014 Xiao, Xu, Shi (b44) 2020; 129 Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In . Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell:A Neural Image Caption Generator. In Bengio, Simard, Frasconi (b9) 1994; 5 Greff, Srivastava, Koutník, Steunebrink, Schmidhuber (b20) 2017; 28 Graves (b18) 2013 Hochreiter, Schmidhuber (b24) 1997; 9 Banerjee, S., & Lavie, A. (2005)METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In LeCun, Bottou, Bengio, Haffner (b29) 1998; 86 Li, Gavrilyuk, Gavves, Jain, Snoek (b30) 2018; 166 Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. In Hochreiter (b23) 1991 Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). CIDEr:Consensus-based image description evaluation. In Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic propositional image caption evaluation. In Kingma, D., & Ba, J. (2015). Adam: a method for stochastic optimization. In Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, Attend and Tell: Neural image caption generation with visual attention. In Elman (10.1016/j.neunet.2021.08.030_b14) 1990; 14 Hochreiter (10.1016/j.neunet.2021.08.030_b23) 1991 Cho (10.1016/j.neunet.2021.08.030_b10) 2014 10.1016/j.neunet.2021.08.030_b19 10.1016/j.neunet.2021.08.030_b16 Greff (10.1016/j.neunet.2021.08.030_b20) 2017; 28 Li (10.1016/j.neunet.2021.08.030_b30) 2018; 166 10.1016/j.neunet.2021.08.030_b37 10.1016/j.neunet.2021.08.030_b11 10.1016/j.neunet.2021.08.030_b36 10.1016/j.neunet.2021.08.030_b13 LeCun (10.1016/j.neunet.2021.08.030_b29) 1998; 86 10.1016/j.neunet.2021.08.030_b32 Hernandez (10.1016/j.neunet.2021.08.030_b22) 2018 10.1016/j.neunet.2021.08.030_b31 Hochreiter (10.1016/j.neunet.2021.08.030_b24) 1997; 9 Bengio (10.1016/j.neunet.2021.08.030_b9) 1994; 5 Ren (10.1016/j.neunet.2021.08.030_b38) 2015 Vaswani (10.1016/j.neunet.2021.08.030_b41) 2017 Gers (10.1016/j.neunet.2021.08.030_b17) 2000; 12 Sutskever (10.1016/j.neunet.2021.08.030_b40) 2014 Jing (10.1016/j.neunet.2021.08.030_b25) 2019; 31 10.1016/j.neunet.2021.08.030_b27 Liu (10.1016/j.neunet.2021.08.030_b33) 2020; 136 10.1016/j.neunet.2021.08.030_b26 Le (10.1016/j.neunet.2021.08.030_b28) 2015 10.1016/j.neunet.2021.08.030_b45 10.1016/j.neunet.2021.08.030_b8 10.1016/j.neunet.2021.08.030_b6 10.1016/j.neunet.2021.08.030_b7 Rumelhart (10.1016/j.neunet.2021.08.030_b39) 1986; 323 10.1016/j.neunet.2021.08.030_b4 10.1016/j.neunet.2021.08.030_b21 10.1016/j.neunet.2021.08.030_b43 10.1016/j.neunet.2021.08.030_b5 10.1016/j.neunet.2021.08.030_b42 10.1016/j.neunet.2021.08.030_b2 Merity (10.1016/j.neunet.2021.08.030_b35) 2018 10.1016/j.neunet.2021.08.030_b3 Xiao (10.1016/j.neunet.2021.08.030_b44) 2020; 129 Chung (10.1016/j.neunet.2021.08.030_b12) 2014 10.1016/j.neunet.2021.08.030_b1 Graves (10.1016/j.neunet.2021.08.030_b18) 2013 Ericsson (10.1016/j.neunet.2021.08.030_b15) 1995; 102 Marcus (10.1016/j.neunet.2021.08.030_b34) 1993; 19 |
References_xml | – reference: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In – volume: 19 start-page: 313 year: 1993 end-page: 330 ident: b34 article-title: Building a large annotated corpus of english: The penn treebank publication-title: Computational Linguistics – reference: Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In – reference: Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In – reference: Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell:A Neural Image Caption Generator. In – reference: Baraldi, L., Grana, C., & Cucchiara, R. (2017). Hierarchical boundary-aware neural encoder for video captioning. In – volume: 5 start-page: 157 year: 1994 end-page: 166 ident: b9 article-title: Learning long-term dependencies with gradient descent is difficult publication-title: IEEE Transactions on Neural Networks – volume: 166 start-page: 41 year: 2018 end-page: 50 ident: b30 article-title: VideoLSTM convolves, attends and flows for action recognition publication-title: Computer Vision and Image Understanding – reference: Gers, F. A., & Schmidhuber, J. (2000). Recurrent nets that time and count. In – volume: 9 start-page: 1735 year: 1997 end-page: 1780 ident: b24 article-title: Long short-term memory publication-title: Neural Computation – reference: Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In – reference: Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In – volume: 102 start-page: 211 year: 1995 ident: b15 article-title: Long-term working memory publication-title: Psychological Review – reference: Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In – reference: Banerjee, S., & Lavie, A. (2005)METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In – reference: Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In – reference: Vedantam, R., Lawrence Zitnick, C., & Parikh, D. (2015). CIDEr:Consensus-based image description evaluation. In – reference: Arpit, D., Kanuparthi, B., Kerg, G., Ke, N. R., Mitliagkas, I., & Bengio, Y. (2019). h-detach: Modifying the LSTM Gradient Towards Better Optimization. In – reference: Kingma, D., & Ba, J. (2015). Adam: a method for stochastic optimization. In – year: 1991 ident: b23 article-title: Untersuchungen zu dynamischen neuronalen Netzen publication-title: Diploma – year: 2017 ident: b41 article-title: Attention is all you need publication-title: Advances in Neural Information Processing Systems – year: 2014 ident: b40 article-title: Sequence to sequence learning with neural networks publication-title: Advances in Neural Information Processing Systems – reference: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. In – reference: Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In – volume: 14 start-page: 179 year: 1990 end-page: 211 ident: b14 article-title: Finding structure in time publication-title: Cognitive Science – volume: 86 start-page: 2278 year: 1998 end-page: 2324 ident: b29 article-title: Gradient-based learning applied to document recognition publication-title: Proceedings of the IEEE – reference: Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In – volume: 28 start-page: 2222 year: 2017 end-page: 2232 ident: b20 article-title: LSTM: A search space Odyssey publication-title: IEEE Transactions on Neural Networks and Learning Systems – year: 2018 ident: b22 article-title: Neuroethics, Nootropics, Neuroenhancement: The Ethical Case Against Pharmacological Enhancements – year: 2015 ident: b38 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Advances in Neural Information Processing Systems – volume: 31 start-page: 765 year: 2019 end-page: 783 ident: b25 article-title: Gated orthogonal recurrent units: On learning to forget publication-title: Neural Computation – volume: 136 start-page: 81 year: 2020 end-page: 86 ident: b33 article-title: Simplified long short-term memory model for robust and fast prediction publication-title: Pattern Recognition Letters – reference: Anderson, P., Fernando, B., Johnson, M., & Gould, S. (2016). SPICE: Semantic propositional image caption evaluation. In – reference: Arjovsky, M., Shah, A., & Bengio, Y. (2016). Unitary evolution recurrent neural networks. In – year: 2018 ident: b35 article-title: An analysis of neural language modeling at multiple scales – year: 2014 ident: b10 article-title: On the properties of neural machine translation: Encoder-decoder approaches – reference: . – reference: Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In – volume: 129 start-page: 173 year: 2020 end-page: 180 ident: b44 article-title: Exploring diverse and fine-grained caption for video by incorporating convolutional architecture into LSTM-based model publication-title: Pattern Recognition Letters – reference: Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, Attend and Tell: Neural image caption generation with visual attention. In – volume: 323 start-page: 533 year: 1986 end-page: 536 ident: b39 article-title: Learning representations by back-propagating errors publication-title: Nature – volume: 12 start-page: 2451 year: 2000 end-page: 2471 ident: b17 article-title: Learning to forget: Continual prediction with LSTM publication-title: Neural Computation – year: 2014 ident: b12 article-title: Empirical evaluation of gated recurrent neural networks on sequence modeling – year: 2015 ident: b28 article-title: A simple way to initialize recurrent networks of rectified linear units – reference: Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In – year: 2013 ident: b18 article-title: Generating sequences with recurrent neural networks – ident: 10.1016/j.neunet.2021.08.030_b16 doi: 10.1109/IJCNN.2000.861302 – ident: 10.1016/j.neunet.2021.08.030_b45 – year: 2013 ident: 10.1016/j.neunet.2021.08.030_b18 – volume: 86 start-page: 2278 issue: 11 year: 1998 ident: 10.1016/j.neunet.2021.08.030_b29 article-title: Gradient-based learning applied to document recognition publication-title: Proceedings of the IEEE doi: 10.1109/5.726791 – year: 2017 ident: 10.1016/j.neunet.2021.08.030_b41 article-title: Attention is all you need – ident: 10.1016/j.neunet.2021.08.030_b2 doi: 10.1109/CVPR.2018.00636 – ident: 10.1016/j.neunet.2021.08.030_b5 – volume: 166 start-page: 41 year: 2018 ident: 10.1016/j.neunet.2021.08.030_b30 article-title: VideoLSTM convolves, attends and flows for action recognition publication-title: Computer Vision and Image Understanding doi: 10.1016/j.cviu.2017.10.011 – ident: 10.1016/j.neunet.2021.08.030_b37 – year: 2014 ident: 10.1016/j.neunet.2021.08.030_b40 article-title: Sequence to sequence learning with neural networks – ident: 10.1016/j.neunet.2021.08.030_b3 – volume: 31 start-page: 765 issue: 4 year: 2019 ident: 10.1016/j.neunet.2021.08.030_b25 article-title: Gated orthogonal recurrent units: On learning to forget publication-title: Neural Computation doi: 10.1162/neco_a_01174 – ident: 10.1016/j.neunet.2021.08.030_b36 doi: 10.3115/1073083.1073135 – year: 2015 ident: 10.1016/j.neunet.2021.08.030_b38 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks – volume: 12 start-page: 2451 issue: 10 year: 2000 ident: 10.1016/j.neunet.2021.08.030_b17 article-title: Learning to forget: Continual prediction with LSTM publication-title: Neural Computation doi: 10.1162/089976600300015015 – year: 2018 ident: 10.1016/j.neunet.2021.08.030_b35 – year: 2014 ident: 10.1016/j.neunet.2021.08.030_b12 – volume: 323 start-page: 533 year: 1986 ident: 10.1016/j.neunet.2021.08.030_b39 article-title: Learning representations by back-propagating errors publication-title: Nature doi: 10.1038/323533a0 – ident: 10.1016/j.neunet.2021.08.030_b19 doi: 10.1109/ICASSP.2013.6638947 – ident: 10.1016/j.neunet.2021.08.030_b27 – ident: 10.1016/j.neunet.2021.08.030_b26 doi: 10.1109/CVPR.2015.7298932 – volume: 129 start-page: 173 year: 2020 ident: 10.1016/j.neunet.2021.08.030_b44 article-title: Exploring diverse and fine-grained caption for video by incorporating convolutional architecture into LSTM-based model publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2019.11.003 – ident: 10.1016/j.neunet.2021.08.030_b32 doi: 10.1007/978-3-319-10602-1_48 – ident: 10.1016/j.neunet.2021.08.030_b42 doi: 10.1109/CVPR.2015.7299087 – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 10.1016/j.neunet.2021.08.030_b24 article-title: Long short-term memory publication-title: Neural Computation doi: 10.1162/neco.1997.9.8.1735 – ident: 10.1016/j.neunet.2021.08.030_b11 doi: 10.3115/v1/D14-1179 – year: 1991 ident: 10.1016/j.neunet.2021.08.030_b23 article-title: Untersuchungen zu dynamischen neuronalen Netzen – ident: 10.1016/j.neunet.2021.08.030_b4 – ident: 10.1016/j.neunet.2021.08.030_b13 – year: 2018 ident: 10.1016/j.neunet.2021.08.030_b22 – ident: 10.1016/j.neunet.2021.08.030_b1 doi: 10.1007/978-3-319-46454-1_24 – ident: 10.1016/j.neunet.2021.08.030_b6 – volume: 136 start-page: 81 year: 2020 ident: 10.1016/j.neunet.2021.08.030_b33 article-title: Simplified long short-term memory model for robust and fast prediction publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2020.05.033 – volume: 28 start-page: 2222 year: 2017 ident: 10.1016/j.neunet.2021.08.030_b20 article-title: LSTM: A search space Odyssey publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2016.2582924 – ident: 10.1016/j.neunet.2021.08.030_b31 – volume: 102 start-page: 211 issue: 2 year: 1995 ident: 10.1016/j.neunet.2021.08.030_b15 article-title: Long-term working memory publication-title: Psychological Review doi: 10.1037/0033-295X.102.2.211 – volume: 19 start-page: 313 issue: 2 year: 1993 ident: 10.1016/j.neunet.2021.08.030_b34 article-title: Building a large annotated corpus of english: The penn treebank publication-title: Computational Linguistics – volume: 14 start-page: 179 year: 1990 ident: 10.1016/j.neunet.2021.08.030_b14 article-title: Finding structure in time publication-title: Cognitive Science doi: 10.1207/s15516709cog1402_1 – ident: 10.1016/j.neunet.2021.08.030_b43 doi: 10.1109/CVPR.2015.7298935 – year: 2015 ident: 10.1016/j.neunet.2021.08.030_b28 – ident: 10.1016/j.neunet.2021.08.030_b21 doi: 10.1109/CVPR.2016.90 – volume: 5 start-page: 157 issue: 2 year: 1994 ident: 10.1016/j.neunet.2021.08.030_b9 article-title: Learning long-term dependencies with gradient descent is difficult publication-title: IEEE Transactions on Neural Networks doi: 10.1109/72.279181 – year: 2014 ident: 10.1016/j.neunet.2021.08.030_b10 – ident: 10.1016/j.neunet.2021.08.030_b8 doi: 10.1109/ICNN.1993.298725 – ident: 10.1016/j.neunet.2021.08.030_b7 doi: 10.1109/CVPR.2017.339 |
SSID | ssj0006843 |
Score | 2.671968 |
Snippet | Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning... |
SourceID | proquest pubmed crossref elsevier |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 334 |
SubjectTerms | Cell-to-gate connections Gated RNNs Image captioning Knowledge Language modeling Learning Long Short-Term Memory networks Neural Networks, Computer |
Title | Working Memory Connections for LSTM |
URI | https://dx.doi.org/10.1016/j.neunet.2021.08.030 https://www.ncbi.nlm.nih.gov/pubmed/34547671 https://www.proquest.com/docview/2575370746 |
Volume | 144 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qe_Hi-1EfJaLX2Gw2ye4eS7HUR3uxgrclj1lQJIptD1787e4km4KgCF4CCbtkM5OZbzb5ZgbgQklDMMd9rjJjNyh54CseGJ8JFhYMlYVoykaeTJPxQ3TzGD-2YNjkwhCt0vn-2qdX3tpd6Ttp9t-envr3gYXahFJFGRVZ4WoNOiFXSdyGzuD6djxdOeRE1uQ5O96nCU0GXUXzKnFZIpEqw7qWJ9Ghf0ao3yLQColGW7DhQkhvUK9yG1pY7sBm057Bc9a6C-fuQ7g3ITbth1dxWqo0hrlnQ1Xv7n422YPZ6Go2HPuuJ4KfW6hZ-FEUmxCZPVpTVUWIhczTRKJIGRfMpBHyNMEiinmRpXlUGKWsUSouc2mDLeT70C5fSzwEL4tQWsFQCT8RYYxZKk0WJEZkEkMjZBd4Iwadu3rh1LbiRTfEsGddC0-T8DR1s-RBF_zVrLe6XsYf40UjYf1N79q69D9mnjUK0dYk6D9HWuLrcq7tA8dcUCOVLhzUmlqthVMBs0Swo3_f9xjW6aymtJxAe_G-xFMbmCyyHqxdfrKee_2-AAfL32U |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB2xHODCvpQ1CK6hSZzG9hEhUIG2lxapNyvLWCpCaUXbAxe-nZksICRQJS45JGPFefYsSd7MAFxpZdnNCVfoxNILSuq5WnjW9aUfZD5qctGcjdztRe3n8HHYGi7BbZ0Lw7TKyvaXNr2w1tWZZoVmczIaNfseudqIU0V9LrIi9DKshi0hmdd3_fHN84hUSZ0jaZfF6_y5guSV4zxHplQGZSVPJkP_7p_-ij8LP3S_BRtVAOnclHPchiXMd2Czbs7gVLq6C5fVZ3Cny1zad6dgtBRJDFOHAlWn0x9092Bwfze4bbtVRwQ3JUczc8OwZQP06UiKqrMAM5XGkUIZ-0L6Ng5RxBFmBEOWxGmYWa1JJbVQqaJQC8U-rOTjHA_BSUJUBAwX8JMhtjCJlU28yMpEYWClaoCoYTBpVS2cm1a8mpoW9mJK8AyDZ7iXpfAa4H6NmpTVMhbIyxph82PVDRn0BSMv6gUxpBD8lyPOcTyfGnpg2gXcRqUBB-VKfc1FcPmySPpH_77vOay1B92O6Tz0no5hna-U5JYTWJm9zfGUQpRZclZswU_laOAw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Working+Memory+Connections+for+LSTM&rft.jtitle=Neural+networks&rft.au=Landi%2C+Federico&rft.au=Baraldi%2C+Lorenzo&rft.au=Cornia%2C+Marcella&rft.au=Cucchiara%2C+Rita&rft.date=2021-12-01&rft.issn=1879-2782&rft.eissn=1879-2782&rft.volume=144&rft.spage=334&rft_id=info:doi/10.1016%2Fj.neunet.2021.08.030&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon |