Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs
Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-KSpMV FPGA desi...
Saved in:
Published in | 2021 58th ACM/IEEE Design Automation Conference (DAC) pp. 799 - 804 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
05.12.2021
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DAC18074.2021.9586203 |
Cover
Summary: | Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-KSpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency. |
---|---|
DOI: | 10.1109/DAC18074.2021.9586203 |