Advanced pseudo-labeling approach in mixing-based text data augmentation method

Text augmentation methods facilitate an increase in the amount of training data, without having to collect new training data, by generating transformed versions of real datasets. Among such methods, mixing-based approaches, which swap words between two or more sentences, are widely applied owing to...

Full description

Saved in:

Bibliographic Details
Published in	Pattern analysis and applications : PAA Vol. 27; no. 4
Main Authors	Park, Jungmin, Lee, Younghoon
Format	Journal Article
Language	English
Published	London Springer London 01.12.2024 Springer Nature B.V
Subjects	Artificial intelligence Computer Science Data augmentation Industrial and Commercial Application Labeling Labels Pattern Recognition Words (language) Explainable artificial intelligence Mix-up approach Over-fitting prevention Text augmentation Word-explainability
Online Access	Get full text
ISSN	1433-7541 1433-755X
DOI	10.1007/s10044-024-01340-6

Cover

Loading…

More Information
Summary:	Text augmentation methods facilitate an increase in the amount of training data, without having to collect new training data, by generating transformed versions of real datasets. Among such methods, mixing-based approaches, which swap words between two or more sentences, are widely applied owing to their simplicity and noteworthy performance. However, existing mixing-based approaches do not consider the importance of manipulated words during the pseudo-labeling process because they utilize a naive linear interpolation method. Thus, this paper proposes an advanced mixing-based text augmentation approach based on artificial intelligence methods that explicitly reflect the importance of manipulated words in the pseudo-labeling process. In addition, to avoid overdependence on the pseudo-labeling quality in the training process, the difference between the original label and prediction is also reflected in the loss function. Experimental results indicate that the performance of the proposed method is significantly higher than that of existing approaches.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-024-01340-6