Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has bee...
Saved in:
Published in | 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) pp. 968 - 981 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated \mathrm{C}\mathrm{P}\mathrm{U}+FPGA device, which shows a 1.7-17.2\times performance speedup and 1.7-19.5\times \mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{g}\mathrm{y}-efficiency improvement than conventional approaches. |
---|---|
DOI: | 10.1109/ISCA45697.2020.00083 |