Text Classification on Imbalanced Data using Graph Neural Networks and Adversarial Weight Balancer

Text classification in imbalanced datasets poses a significant challenge in natural language processing (NLP) applications. In this paper, we address this issue by treating text data as graphs, with words as nodes, and employ two strategies for establishing connections between words within these gra...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) pp. 01 - 06
Main Authors Badiei, Fatemeh, Kananian, Makan, Ghahramani, S. AmirAli Gh
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Text classification in imbalanced datasets poses a significant challenge in natural language processing (NLP) applications. In this paper, we address this issue by treating text data as graphs, with words as nodes, and employ two strategies for establishing connections between words within these graphs. The first strategy employs a fixed-size context window to connect words that lie within the same window, while the second strategy leverages word embedding vectors and a k-nearest neighbors (KNN) approach to connect each word to its k most similar neighbors. To further enhance the classification performance, we combine Graph Convolutional Neural Networks (GCNs) and Long Short-Term Memory (LSTM) networks to process the textual information. Importantly, we address the challenge of class imbalance using an adversarial loss framework. We introduce separate weight generator networks for each class within the dataset, responsible for dynamically assigning weights to individual samples during training. While the classifier aims to minimize its weighted cross-entropy loss, the weight-generating networks strategically assign higher weights to misclassified samples and reduce the weights assigned to correctly classified ones, thus increasing the classifier's loss in subsequent epochs. This adversarial loss, in conjunction with the LSTM-GCN architecture, results in finely tuned sample weights, leading to significantly improved accuracy and F1-Score compared to conventional methods for handling imbalanced text classification problems. Our experimental results demonstrate the effectiveness of our approach in addressing the challenges of imbalanced text classification, offering a solution for a wide range of NLP applications where imbalanced datasets are prevalent.
DOI:10.1109/CSDE59766.2023.10487693