Clustering Algorithms and RAG Enhancing Semi-Supervised Text Classification with Large LLMs
This paper introduces an innovative semi-supervised learning approach for text classification, addressing the challenge of abundant data but limited labeled examples. Our methodology integrates few-shot learning with retrieval-augmented generation (RAG) and conventional statistical clustering, enabl...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper introduces an innovative semi-supervised learning approach for
text classification, addressing the challenge of abundant data but limited
labeled examples. Our methodology integrates few-shot learning with
retrieval-augmented generation (RAG) and conventional statistical clustering,
enabling effective learning from a minimal number of labeled instances while
generating high-quality labeled data. To the best of our knowledge, we are the
first to incorporate RAG alongside clustering in text data generation. Our
experiments on the Reuters and Web of Science datasets demonstrate
state-of-the-art performance, with few-shot augmented data alone producing
results nearly equivalent to those achieved with fully labeled datasets.
Notably, accuracies of 95.41\% and 82.43\% were achieved for complex text
document classification tasks, where the number of categories can exceed 100. |
---|---|
DOI: | 10.48550/arxiv.2411.06175 |