Two Distributed Arithmetic Based High Throughput Architectures of Non-Pipelined LMS Adaptive Filters

Distributed arithmetic (DA) is an efficient look-up table (LUT) based approach. The throughput of DA based implementation is limited by the LUT size. This paper presents two high-throughput architectures (Type I and II) of non-pipelined DA based least-mean-square (LMS) adaptive filters (ADFs) using...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 10; pp. 76693 - 76706
Main Authors	Khan, Mohd. Tasleem, Alhartomi, Mohammed A., Alzahrani, Saeed, Shaik, Rafi Ahamed, Alsulami, Ruwaybih
Format	Journal Article
Language	English
Published	Piscataway IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptive filter (ADF) Adaptive filters Algorithms Arithmetic Binary codes Convergence Decomposition distributed arithmetic (DA) finite-impulse response (FIR) Fixed point arithmetic Hardware least mean square (LMS) look-up table (LUT) Lookup tables Steady-state Table lookup Throughput Very large scale integration look-up table (LUT) finite-impulse response (FIR) least mean square (LMS) distributed arithmetic (DA) Adaptive filter (ADF)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Distributed arithmetic (DA) is an efficient look-up table (LUT) based approach. The throughput of DA based implementation is limited by the LUT size. This paper presents two high-throughput architectures (Type I and II) of non-pipelined DA based least-mean-square (LMS) adaptive filters (ADFs) using two's complement (TC) and offset-binary coding (OBC) respectively. We formulate the LMS algorithm using the steepest descent approach with possible extension to its power-normalized LMS version and followed by its convergence properties. The coefficient update equation of LMS algorithm is then transformed via TC DA and OBC DA to design and develop non-pipelined architectures of ADFs. The proposed structures employ the LUT pre-decomposition technique to increase the throughput performance. It enables the same mapping scheme for concurrent update of the decomposed LUTs. An efficient fixed-point quantization model for the evaluation of proposed structures from a realistic point-of-view is also presented. It is found that Type II structure provides higher throughput than Type I structure at the expense of slow convergence rate with almost the same steady-state mean square error. Unlike existing non-pipelined LMS ADFs, the proposed structures offer very high throughput performance, especially with large order DA base units. Furthermore, they are capable of performing less number of additions in every filter cycle. Based on the simulation results, it is found that <inline-formula> <tex-math notation="LaTeX">256^{\mathrm {th}} </tex-math></inline-formula> order filter with <inline-formula> <tex-math notation="LaTeX">8^{\mathrm {th}} </tex-math></inline-formula> order DA base unit using Type I structure provides <inline-formula> <tex-math notation="LaTeX">9.41 \times </tex-math></inline-formula> higher throughput while Type II structure provides <inline-formula> <tex-math notation="LaTeX">16.68 \times </tex-math></inline-formula> higher throughput as compared to the best existing design. Synthesis results show that <inline-formula> <tex-math notation="LaTeX">32^{\mathrm {nd}} </tex-math></inline-formula> order filter with <inline-formula> <tex-math notation="LaTeX">8^{\mathrm {th}} </tex-math></inline-formula> order DA base unit using Type I structure achieves 38.76% less minimum sampling period (MSP), occupies 28.62% more area, consumes 67.18% more power, utilizes 49.06% more slice LUTs and 3.31% more flip-flops (FFs), whereas Type II structure achieves 51.25% less MSP, occupies 21.42% more area, consumes 47.84% more power, utilizes 29.10% more slice LUTs and 1.47% fewer FFs as compared to the best existing design.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3192619