Floating-Point Computations on Reconfigurable Computers

Modern reconfigurable computers combine general-purpose processors with field programmable gate arrays (FPGAs). The FPGAs are, in effect, reconfigurable application-specific coprocessors. During one run, the FPGA might be a matrix-vector multiply coprocessor; during another run, it might be a linear...

Full description

Saved in:
Bibliographic Details
Published in2007 DoD High Performance Computing Modernization Program Users Group Conference pp. 339 - 344
Main Author Morris, G.R.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Modern reconfigurable computers combine general-purpose processors with field programmable gate arrays (FPGAs). The FPGAs are, in effect, reconfigurable application-specific coprocessors. During one run, the FPGA might be a matrix-vector multiply coprocessor; during another run, it might be a linear equation solver. There are several issues associated with the mapping of floating-point computations onto RCs. There is the determination of what the author terms "the FPGA design boundary," i.e., the portion of the application that is mapped onto the FPGA. Furthermore, FPGA-based kernel performance is heavily dependent upon both pipelining and parallelism. The author has coined the phrase "the three p's" to encapsulate this important relationship. In this paper, important FPGA design boundary heuristics are described, and a toroidal architecture and partitioned loop algorithm are used to maximize both pipelining and parallelism for a double- precision floating-point sparse matrix conjugate gradient solver that is mapped onto a reconfigurable computer. Wall clock run time comparisons show that the FPGA- augmented version runs more than two times faster than the software-only version.
DOI:10.1109/HPCMP-UGC.2007.35