Global Attention Improves Graph Networks Generalization
This paper advocates incorporating a Low-Rank Global Attention (LRGA) module, a computation and memory efficient variant of the dot-product attention (Vaswani et al., 2017), to Graph Neural Networks (GNNs) for improving their generalization power. To theoretically quantify the generalization propert...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.06.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper advocates incorporating a Low-Rank Global Attention (LRGA) module,
a computation and memory efficient variant of the dot-product attention
(Vaswani et al., 2017), to Graph Neural Networks (GNNs) for improving their
generalization power. To theoretically quantify the generalization properties
granted by adding the LRGA module to GNNs, we focus on a specific family of
expressive GNNs and show that augmenting it with LRGA provides algorithmic
alignment to a powerful graph isomorphism test, namely the 2-Folklore
Weisfeiler-Lehman (2-FWL) algorithm. In more detail we: (i) consider the recent
Random Graph Neural Network (RGNN) (Sato et al., 2020) framework and prove that
it is universal in probability; (ii) show that RGNN augmented with LRGA aligns
with 2-FWL update step via polynomial kernels; and (iii) bound the sample
complexity of the kernel's feature map when learned with a randomly initialized
two-layer MLP. From a practical point of view, augmenting existing GNN layers
with LRGA produces state of the art results in current GNN benchmarks. Lastly,
we observe that augmenting various GNN architectures with LRGA often closes the
performance gap between different models. |
---|---|
DOI: | 10.48550/arxiv.2006.07846 |