NTGAT A Graph Attention Network Accelerator with Runtime Node Tailoring

Graph Attention Network (GAT) has demonstrated better performance in many graph tasks than previous Graph Neural Networks (GNN). However, it involves graph attention operations with extra computing complexity. While a large amount of existing literature has researched GNN acceleration, few have focu...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 28th Asia and South Pacific Design Automation Conference pp. 645 - 650
Main Authors	Hou, Wentao, Zhong, Kai, Zeng, Shulin, Dai, Guohao, Yang, Huazhong, Wang, Yu
Format	Conference Proceeding
Language	English
Published	New York, NY, USA ACM 16.01.2023
Series	ACM Conferences
Subjects	Computing methodologies > Machine learning > Machine learning approaches > Neural networks Hardware > Electronic design automation > High-level and register-transfer level synthesis > Hardware-software codesign graph attention network software-hardware codesign
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Graph Attention Network (GAT) has demonstrated better performance in many graph tasks than previous Graph Neural Networks (GNN). However, it involves graph attention operations with extra computing complexity. While a large amount of existing literature has researched GNN acceleration, few have focused on the attention mechanism in GAT. The graph attention mechanism makes the computation flow different. Therefore, previous GNN accelerators can not support GAT well. Besides, GAT distinguishes the importance of neighbors and makes it possible to reduce the workload through runtime tailoring. We present NTGAT, a software-hardware co-design approach to accelerate GAT with runtime node tailoring. Our work comprises both a runtime node tailoring algorithm and an accelerator design. We propose a pipeline sorting method and a hardware unit to support node tailoring during inference. The experiments show that our algorithm can reduce up to 86% of aggregation workload while incurring slight accuracy loss (<0.4%). And the FPGA based accelerator can achieve up to 3.8× speedup and 4.98× energy efficiency comparing to the GPU baseline.
ISBN:	9781450397834 1450397832
DOI:	10.1145/3566097.3567869