Towards a Universal FPGA Matrix-Vector Multiplication Architecture

We present the design and implementation of a universal, single-bit stream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-o...

Full description

Saved in:

Bibliographic Details
Published in	2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines pp. 9 - 16
Main Authors	Kestur, S., Davis, J. D., Chung, E. S.
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2012
Subjects	Arrays Decoding dense matrix Encoding Field programmable gate arrays FPGA Libraries reconfigurable computing Sparse matrices sparse matrix spMV Vectors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present the design and implementation of a universal, single-bit stream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly-decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.
ISBN:	9781467316057 1467316059
DOI:	10.1109/FCCM.2012.12