Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs

The 2-dimensional (2D) fast Fourier transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arr...

Full description

Saved in:
Bibliographic Details
Published in2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC) pp. 1 - 8
Main Authors Hojin Kee, Bhattacharyya, S.S., Petersen, N., Kornerup, J.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2009
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The 2-dimensional (2D) fast Fourier transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.
DOI:10.1109/ICDSC.2009.5289356