FPGA Hardware Implementation of Efficient Long Short-Term Memory Network Based on Construction Vector Method

Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 11; pp. 122357 - 122367
Main Authors	Li, Tengfei, Gu, Shenshen
Format	Journal Article
Language	English
Published	Piscataway IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Cognitive tasks Compression ratio Computational modeling construct vector method Design Field programmable gate arrays Field-programmable gate array (FPGA) Hardware High level synthesis Linear transformations Logic gates Long short term memory long short-term memory (LSTM) Machine translation Mathematical models Matrix methods model compression Pipelining (computers) Sparse matrices Speech recognition Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM an urgent need. In this paper, we first propose a weight parameter generation method based on vector construction that can make the model have a higher compression ratio and produce less precision attenuation. Furthermore, we study in detail the influence of the size of the construction vector on the computational complexity, model compression ratio and accuracy of the construction vector, in order to obtain the optimal size design interval. Moreover, we designed a linear transformation method and a convolution method to reduce the dimension of the input sequence, so that it can be applied to training sets of different dimensions without changing the size of the model construction vector. Finally, we use high-level synthesis (HLS) to deploy the obtained LSTM inference model to the FPGA device, and use the parallel pipeline operation to realize the reuse of resources. Experiments show that, compared with the block circulant matrix method, the proposed designs generated by our framework achieve up to 2 times gains for compression with same accuracy degradation, and it has an acceptable delay. With the same compression ratio, our accuracy decay is 45% of the former.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3329048