Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation

Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training → fine-tuning” approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computat...

Full description

Saved in:

Bibliographic Details
Published in	Cybernetics and systems analysis Vol. 60; no. 2; pp. 167 - 174
Main Authors	Skurzhanskyi, O. H., Marchenko, O. O., Anisimov, A. V.
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2024 Springer Springer Nature B.V
Subjects	Artificial Intelligence Computational linguistics Control Cybernetics Language processing Large language models Machine learning Mathematics Mathematics and Statistics Natural language interfaces Natural language processing Neural networks Processor Architectures Software Engineering/Programming and Operating Systems Synthetic data Systems Theory fine tuning paraphrase generation machine learning pre-training artificial intelligence neural networks
Online Access	Get full text
ISSN	1060-0396 1573-8337
DOI	10.1007/s10559-024-00658-7

Cover

More Information
Summary:	Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training → fine-tuning” approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computational resources, and the available pre-trained models are limited by fixed architecture and size. The authors have proposed a simple and efficient approach to pre-training specifically for paraphrase generation, which noticeably improves the quality of paraphrase generation and ensures substantial enhancement of general-purpose models. They have used existing public data and new data generated by large language models. The authors have investigated how this pre-training procedure impacts neural networks of various architectures and demonstrated its efficiency across all architectures.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1060-0396 1573-8337
DOI:	10.1007/s10559-024-00658-7