Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs

Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-KSpMV FPGA desi...

Full description

Saved in:

Bibliographic Details
Published in	2021 58th ACM/IEEE Design Automation Conference (DAC) pp. 799 - 804
Main Authors	Parravicini, Alberto, Cellamare, Luca Giuseppe, Siracusa, Marco, Santambrogio, Marco D.
Format	Conference Proceeding
Language	English
Published	IEEE 05.12.2021
Subjects	Approximate Computing Bandwidth Computer architecture Design automation FPGA Graphics processing units Hardware Acceleration HBM Layout Sparse matrices Spectral efficiency SpMV
Online Access	Get full text
DOI	10.1109/DAC18074.2021.9586203

Cover

More Information
Summary:	Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-KSpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.
DOI:	10.1109/DAC18074.2021.9586203