A generalized LSTM-like training algorithm for second-order recurrent neural networks

The Long Short Term Memory (LSTM) is a second-order recurrent neural network architecture that excels at storing sequential short-term memories and retrieving them many time-steps later. LSTM’s original training algorithm provides the important properties of spatial and temporal locality, which are...

Full description

Saved in:

Bibliographic Details
Published in	Neural networks Vol. 25; no. 1; pp. 70 - 83
Main Authors	Monner, Derek, Reggia, James A.
Format	Journal Article
Language	English
Published	Kidlington Elsevier Ltd 01.01.2012 Elsevier
Subjects	Algorithms Applied sciences Artificial intelligence Computer science; control theory; systems Computer systems and distributed systems. User interface Connectionism. Neural networks Exact sciences and technology Gradient-based training Learning - physiology Long Short Term Memory (LSTM) Neural Networks, Computer Recurrent neural network Sequential retrieval Software Temporal sequence processing Time Factors Gradient-based training Long Short Term Memory (LSTM) Recurrent neural network Sequential retrieval Temporal sequence processing Short term Recurrent neural nets Gradient Locality Network architecture Neural network Long term Temporal logic Local network Memory effect
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Long Short Term Memory (LSTM) is a second-order recurrent neural network architecture that excels at storing sequential short-term memories and retrieving them many time-steps later. LSTM’s original training algorithm provides the important properties of spatial and temporal locality, which are missing from other training approaches, at the cost of limiting its applicability to a small set of network architectures. Here we introduce the Generalized Long Short-Term Memory(LSTM-g) training algorithm, which provides LSTM-like locality while being applicable without modification to a much wider range of second-order network architectures. With LSTM-g, all units have an identical set of operating instructions for both activation and learning, subject only to the configuration of their local environment in the network; this is in contrast to the original LSTM training algorithm, where each type of unit has its own activation and training instructions. When applied to LSTM architectures with peephole connections, LSTM-g takes advantage of an additional source of back-propagated error which can enable better performance than the original algorithm. Enabled by the broad architectural applicability of LSTM-g, we demonstrate that training recurrent networks engineered for specific tasks can produce better results than single-layer networks. We conclude that LSTM-g has the potential to both improve the performance and broaden the applicability of spatially and temporally local gradient-based training algorithms for recurrent neural networks.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2011.07.003