Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has bee...

Full description

Saved in:

Bibliographic Details
Published in	2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) pp. 968 - 981
Main Authors	Hwang, Ranggi, Kim, Taehun, Kwon, Youngeun, Rhu, Minsoo
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Accelerator Computer architecture Deep learning Electronic commerce Field programmable gate arrays FPGA machine learning Machine learning algorithms neural network Neural networks Performance evaluation processor architecture Proposals Prototypes Throughput
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated \mathrm{C}\mathrm{P}\mathrm{U}+FPGA device, which shows a 1.7-17.2\times performance speedup and 1.7-19.5\times \mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{g}\mathrm{y}-efficiency improvement than conventional approaches.
DOI:	10.1109/ISCA45697.2020.00083