An efficient segmented quantization for graph neural networks

Graph Neural Networks (GNNs) are recently developed machine learning approaches that exploit the advances in Neural Networks for a wide range of graph applications. While GNNs achieve promising inference accuracy improvements over conventional approaches, their efficiency suffers from expensive comp...

Full description

Saved in:

Bibliographic Details
Published in	CCF transactions on high performance computing (Online) Vol. 4; no. 4; pp. 461 - 473
Main Authors	Dai, Yue, Tang, Xulong, Zhang, Youtao
Format	Journal Article
Language	English
Published	Singapore Springer Nature Singapore 01.12.2022 Springer Nature B.V
Subjects	Accuracy Approximation Complexity Computer Hardware Computer memory Computer Science Computer Systems Organization and Communication Networks Efficiency Graph neural networks Inference Machine learning Mathematical analysis Neural networks Regular Paper Segments Quantization Graph neural network Accelerator
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Graph Neural Networks (GNNs) are recently developed machine learning approaches that exploit the advances in Neural Networks for a wide range of graph applications. While GNNs achieve promising inference accuracy improvements over conventional approaches, their efficiency suffers from expensive computation and intensive memory access in feature aggregation and combination phases, leading to large inference latency. Recent studies proposed mixed-precision feature quantization to address the memory access overhead. However, its linear approximation and computation complexity become the main constraints for the overall GNN accuracy and performance. In this paper, we propose segmented quantization to partition the feature range into segments and customize linear approximation within each segment based on original value density, and conduct efficient mixed-precision computing between quantized feature and full precision weights. Segmented quantization helps to achieve high inference accuracy while maintaining low computation complexity. We also devise the hardware accelerator to fully explore the benefits of segmented quantization. Our experiments show that up to 5% average accuracy and up to 6.8 × performance improvements can be achieved over the state-of-the-art GNN accelerators.
ISSN:	2524-4922 2524-4930
DOI:	10.1007/s42514-022-00121-z