Digit-Serial DA-Based Fixed-Point RNNs: A Unified Approach for Enhancing Architectural Efficiency

The next crucial step in artificial intelligence involves integrating neural network models into embedded and mobile systems. This requires designing compact and energy-efficient neural network models in silicon for optimized performance. This article introduces a unified approach for enhancing the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. PP; pp. 1 - 15
Main Authors	Khan, Mohd Tasleem, Alhartomi, Mohammed A.
Format	Journal Article
Language	English
Published	United States IEEE 22.07.2024
Subjects	Digit-serial distributed arithmetic (DSDA) Logic gates Long short term memory long short-term memory (LSTM) matrix–vector multiplication (MVM) Quantization (signal) recurrent neural network (RNN) Recurrent neural networks Task analysis Training Vectors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The next crucial step in artificial intelligence involves integrating neural network models into embedded and mobile systems. This requires designing compact and energy-efficient neural network models in silicon for optimized performance. This article introduces a unified approach for enhancing the architectural efficiency of long short-term memory (LSTM) recurrent neural networks (RNNs). Precisely, two new structures (I and II) based on the two's complement (TC) digit-serial distributed arithmetic (DSDA) technique are presented. The block-circulant matrix-vector multiplications (MVMs) and element-wise multiplications (EWMs) are formulated using TC DSDA. In addition, a fixed-point (FxP) training procedure for quantized LSTM RNNs is considered and validated for speech recognition tasks. Both structures leverage the circular rotation of weights and generate partial products with input digit slices. A new partial-product generator (PPG) and partial-product selector (PPS) designed to work with both unsigned and signed digits is introduced. In Structure I, a nonpipelined MVM is realized with a few PPGs and PPSs, followed by a shift-accumulate unit (SAU). Conversely, in Structure II, a suitably chosen depth-pipelined MVM is achieved with multiple PPGs and PPSs, followed by a shift-to-add tree (SAT). A critical path delay (CPD) analysis for both the proposed structures is also presented. Compared with previous works, post-synthesis results on <inline-formula> <tex-math notation="LaTeX">28</tex-math> </inline-formula>-nm fully depleted silicon-on-insulator (FDSOI) technology reveal that for a model size of <inline-formula> <tex-math notation="LaTeX">128</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">128</tex-math> </inline-formula>, Structures I and II provide <inline-formula> <tex-math notation="LaTeX">39.87\%</tex-math> </inline-formula>, <inline-formula> <tex-math notation="LaTeX">95.63\%</tex-math> </inline-formula>, and <inline-formula> <tex-math notation="LaTeX">30.95\%</tex-math> </inline-formula>, <inline-formula> <tex-math notation="LaTeX">91.18\%</tex-math> </inline-formula> more area and energy efficiencies, respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2024.3425569