FPGA Hardware Implementation of Efficient Long Short-Term Memory Network Based on Construction Vector Method

Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM a...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 11; pp. 122357 - 122367
Main Authors Li, Tengfei, Gu, Shenshen
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM an urgent need. In this paper, we first propose a weight parameter generation method based on vector construction that can make the model have a higher compression ratio and produce less precision attenuation. Furthermore, we study in detail the influence of the size of the construction vector on the computational complexity, model compression ratio and accuracy of the construction vector, in order to obtain the optimal size design interval. Moreover, we designed a linear transformation method and a convolution method to reduce the dimension of the input sequence, so that it can be applied to training sets of different dimensions without changing the size of the model construction vector. Finally, we use high-level synthesis (HLS) to deploy the obtained LSTM inference model to the FPGA device, and use the parallel pipeline operation to realize the reuse of resources. Experiments show that, compared with the block circulant matrix method, the proposed designs generated by our framework achieve up to 2 times gains for compression with same accuracy degradation, and it has an acceptable delay. With the same compression ratio, our accuracy decay is 45% of the former.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3329048