Backward Oversmoothing: why is it hard to train deep Graph Neural Networks?
Oversmoothing has long been identified as a major limitation of Graph Neural Networks (GNNs): input node features are smoothed at each layer and converge to a non-informative representation, if the weights of the GNN are sufficiently bounded. This assumption is crucial: if, on the contrary, the weig...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
22.05.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2505.16736 |
Cover
Loading…
Summary: | Oversmoothing has long been identified as a major limitation of Graph Neural
Networks (GNNs): input node features are smoothed at each layer and converge to
a non-informative representation, if the weights of the GNN are sufficiently
bounded. This assumption is crucial: if, on the contrary, the weights are
sufficiently large, then oversmoothing may not happen. Theoretically, GNN could
thus learn to not oversmooth. However it does not really happen in practice,
which prompts us to examine oversmoothing from an optimization point of view.
In this paper, we analyze backward oversmoothing, that is, the notion that
backpropagated errors used to compute gradients are also subject to
oversmoothing from output to input. With non-linear activation functions, we
outline the key role of the interaction between forward and backward smoothing.
Moreover, we show that, due to backward oversmoothing, GNNs provably exhibit
many spurious stationary points: as soon as the last layer is trained, the
whole GNN is at a stationary point. As a result, we can exhibit regions where
gradients are near-zero while the loss stays high. The proof relies on the fact
that, unlike forward oversmoothing, backward errors are subjected to a linear
oversmoothing even in the presence of non-linear activation function, such that
the average of the output error plays a key role. Additionally, we show that
this phenomenon is specific to deep GNNs, and exhibit counter-example
Multi-Layer Perceptron. This paper is a step toward a more complete
comprehension of the optimization landscape specific to GNNs. |
---|---|
DOI: | 10.48550/arxiv.2505.16736 |