In-Place Multicore SIMD Fast Fourier Transforms

We revisit 1D Fast Fourier Transforms (FFT) implementation approaches in the context of compute units composed of multiple cores with SIMD ISA extensions and sharing a multi-banked local memory. A main constraint is to spare use of local memory, which motivates us to use in-place FFT implementations...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE High Performance Extreme Computing Conference (HPEC) pp. 1 - 6
Main Authors de Dinechin, Benoit Dupont, Hascoet, Julien, Desrentes, Oregane
Format Conference Proceeding
LanguageEnglish
Published IEEE 25.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We revisit 1D Fast Fourier Transforms (FFT) implementation approaches in the context of compute units composed of multiple cores with SIMD ISA extensions and sharing a multi-banked local memory. A main constraint is to spare use of local memory, which motivates us to use in-place FFT implementations and to generate the twiddle factors with trigonometric recurrences. A key objective is to maximize bandwidth of the multi-banked local memory system by ensuring that cores issue maximum-width aligned non-temporal SIMD accesses. We propose combining the SIMD lane-slicing and sample partitioning techniques to derive multicore FFT implementations that do not require matrix transpositions and only involve one stage of bit-reverse unscrambling. This approach is demonstrated on the Kalray MPPA3 processor compute unit, where it outperforms the classic six-step algorithm for multicore FFT implementation.
ISSN:2643-1971
DOI:10.1109/HPEC58863.2023.10363536