Scalable Graph Learning with Graph Convolutional Networks and Graph Attention Networks: Addressing Class Imbalance Through Augmentation and Optimized Hyperparameter Tuning

In this study, we propose a graph-based node classification to address challenges such as data scarcity, class imbalance, limited access to original textual content in benchmark datasets, semantic preservation, and model generalization in node classification tasks. Beyond simple data replication, we...

Full description

Saved in:

Bibliographic Details
Published in	International journal of advanced computer science & applications Vol. 16; no. 7
Main Authors	Touate, Chaima Ahle, Ayachi, Rachid El, Biniz, Mohamed
Format	Journal Article
Language	English
Published	West Yorkshire Science and Information (SAI) Organization Limited 2025
Subjects	Approximation Architecture Artificial neural networks Attention Classification Computer science Data replication Datasets Empirical analysis Innovations Machine learning Natural language processing Neural networks Semantics Text categorization Tuning
Online Access	Get full text
ISSN	2158-107X 2156-5570
DOI	10.14569/IJACSA.2025.0160740

Cover

Loading…

More Information
Summary:	In this study, we propose a graph-based node classification to address challenges such as data scarcity, class imbalance, limited access to original textual content in benchmark datasets, semantic preservation, and model generalization in node classification tasks. Beyond simple data replication, we enhanced the Cora dataset by extracting content from its original PostScript files using a three-dimensional framework that combines in one pipeline NLP-based techniques such as PEGASUS paraphrase, synthetic model generation and a controlled subject aware synonym replacement. We substantially expanded the dataset to 17,780 nodes—representing an approximation of 6.57x scaling while maintaining semantic fidelity (WMD scores: 0.27-0.34). Our Bayesian Hyperparameter tuning was conducted using Optuna, along with k-fold cross-validation for a rigorous optimized model validation protocol. Our Graph Convolutional Network (GCN) model achieves 95.42% accuracy while Graph Attention Network (GAT) reaches 93.46%, even when scaled to a significantly larger dataset than the base. Our empirical analysis demonstrates that semantic-preserving augmentation helped us achieve better performance while maintaining model stability across scaled datasets, offering a cost-effective alternative to architectural complexity, making graph learning accessible to resource-constrained environments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2025.0160740