Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation
Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training → fine-tuning” approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computat...
Saved in:
Published in | Cybernetics and systems analysis Vol. 60; no. 2; pp. 167 - 174 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.03.2024
Springer Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 1060-0396 1573-8337 |
DOI | 10.1007/s10559-024-00658-7 |
Cover
Abstract | Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training → fine-tuning” approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computational resources, and the available pre-trained models are limited by fixed architecture and size. The authors have proposed a simple and efficient approach to pre-training specifically for paraphrase generation, which noticeably improves the quality of paraphrase generation and ensures substantial enhancement of general-purpose models. They have used existing public data and new data generated by large language models. The authors have investigated how this pre-training procedure impacts neural networks of various architectures and demonstrated its efficiency across all architectures. |
---|---|
AbstractList | Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the "pre-training [right arrow] fine-tuning" approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computational resources, and the available pre-trained models are limited by fixed architecture and size. The authors have proposed a simple and efficient approach to pre-training specifically for paraphrase generation, which noticeably improves the quality of paraphrase generation and ensures substantial enhancement of general-purpose models. They have used existing public data and new data generated by large language models. The authors have investigated how this pre-training procedure impacts neural networks of various architectures and demonstrated its efficiency across all architectures. Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training → fine-tuning” approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computational resources, and the available pre-trained models are limited by fixed architecture and size. The authors have proposed a simple and efficient approach to pre-training specifically for paraphrase generation, which noticeably improves the quality of paraphrase generation and ensures substantial enhancement of general-purpose models. They have used existing public data and new data generated by large language models. The authors have investigated how this pre-training procedure impacts neural networks of various architectures and demonstrated its efficiency across all architectures. Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the "pre-training [right arrow] fine-tuning" approach has become the standard. However, popular general pre-training methods typically require extensive datasets and great computational resources, and the available pre-trained models are limited by fixed architecture and size. The authors have proposed a simple and efficient approach to pre-training specifically for paraphrase generation, which noticeably improves the quality of paraphrase generation and ensures substantial enhancement of general-purpose models. They have used existing public data and new data generated by large language models. The authors have investigated how this pre-training procedure impacts neural networks of various architectures and demonstrated its efficiency across all architectures. Keywords: artificial intelligence, machine learning, neural networks, paraphrase generation, pre-training, fine tuning. |
Audience | Academic |
Author | Anisimov, A. V. Skurzhanskyi, O. H. Marchenko, O. O. |
Author_xml | – sequence: 1 givenname: O. H. surname: Skurzhanskyi fullname: Skurzhanskyi, O. H. email: oleksandr.skurzhanskyi@gmail.com organization: Taras Shevchenko National University of Kyiv – sequence: 2 givenname: O. O. surname: Marchenko fullname: Marchenko, O. O. organization: Taras Shevchenko National University of Kyiv – sequence: 3 givenname: A. V. surname: Anisimov fullname: Anisimov, A. V. organization: Taras Shevchenko National University of Kyiv |
BookMark | eNp9kUFvFCEUx4mpiW31C3gi8eSB-hhmBjg2VesmjTbd9kzeMrCl7sIKrG399FLHxPRiODwgv9_jhf8ROYgpOkLecjjhAPJD4TAMmkHXM4BxUEy-IId8kIIpIeRB28MIDIQeX5GjUu4AQIBUhwSXO2cDbsIvN9HL7Nh1xhBDXNPk6Ve3z7hppd6n_L3QFOnyMdZbV4OlH7Ei9SnTxXaX088n5RIz7m4zFkfPXXQZa0jxNXnpcVPcm7_1mNx8_nR99oVdfDtfnJ1eMCt6UZkfuOeTbocRUPd2pUbtJz8pb3mvxGplB2lRcBDoHYydx077UU8Dl0NvFYpj8m7u26b5sXelmru0z7E9aQSIHjrdK2jUyUytceNMiD7VjLatyW2DbZ_qQ7s_lbrr5AhKN-H9M6Ex1T3UNe5LMYvl1XO2m1mbUynZebPLYYv50XAwTzmZOSfTcjJ_cjKySWKWSoPj2uV_c__H-g33sZcb |
Cites_doi | 10.1016/j.aiopen.2021.08.002 10.3115/1073083.1073135 10.48550/arXiv.1610.03098 10.48550/arXiv.1702.03814 10.1609/aaai.v33i01.33016834 10.48550/arXiv.1711.05732 10.18653/v1/W18-64019 10.18653/v1/2021.eacl-main.180 10.18653/v1/2021.acl-long.335 10.18653/v1/2020.emnlp-main.55 10.18653/v1/2020.acl-main.22 10.1162/neco.1997.9.8.1735 10.48550/arXiv.1706.03762 10.48550/arXiv.1903.00138 10.3115/1626355.1626389 10.3115/v1/P15-2070 10.48550/arXiv.2203.02155 10.48550/arXiv.2005.12592 10.48550/arXiv.2006.10369 10.48550/arXiv.2011.14244 10.18653/v1/2021.acl-long.112 10.48550/arXiv.2001.01941 10.18653/v1/2020.acl-main.703 10.48550/arXiv.1405.0312 |
ContentType | Journal Article |
Copyright | Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. COPYRIGHT 2024 Springer |
Copyright_xml | – notice: Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: COPYRIGHT 2024 Springer |
DBID | AAYXX CITATION ISR JQ2 |
DOI | 10.1007/s10559-024-00658-7 |
DatabaseName | CrossRef Gale In Context: Science (UHCL Subscription) ProQuest Computer Science Collection |
DatabaseTitle | CrossRef ProQuest Computer Science Collection |
DatabaseTitleList | ProQuest Computer Science Collection |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Sciences (General) Mathematics |
EISSN | 1573-8337 |
EndPage | 174 |
ExternalDocumentID | A792276089 10_1007_s10559_024_00658_7 |
GroupedDBID | -52 -5D -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 1N0 1SB 2.D 28- 29F 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 642 67Z 6NX 7WY 8AO 8FE 8FG 8FL 8FW 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFFNX AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AZFZN AZQEC B-. BA0 BAPOH BBWZM BDATZ BENPR BEZIV BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EDO EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GROUPED_ABI_INFORM_RESEARCH GXS H13 HCIFZ HF~ HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F IAO IHE IJ- IKXTQ ISR ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV KOW LAK LLZTM M0C M0N M4Y MA- MK~ N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P62 P9O PF0 PQBIZ PQBZA PQQKQ PROAC PT4 PT5 Q2X QOK QOS R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S0W S16 S1Z S26 S27 S28 S3B SAP SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WH7 WK8 XU3 YLTOR Z7R Z7U Z7X Z7Z Z83 Z88 Z8M Z8R Z8T Z8W Z92 ZMTXR ZWQNP ~A9 ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR CITATION ICD PHGZM PHGZT AEIIB ABRTQ JQ2 |
ID | FETCH-LOGICAL-c343t-f51f1d9c3460a94cb869fdfd8fc1483bbc57ca3103afe062fa29f69d51754c8a3 |
IEDL.DBID | AGYKE |
ISSN | 1060-0396 |
IngestDate | Sat Sep 13 16:30:46 EDT 2025 Tue Jun 10 21:02:46 EDT 2025 Fri Jun 27 05:15:28 EDT 2025 Tue Jul 01 00:42:02 EDT 2025 Fri Feb 21 02:40:59 EST 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Keywords | fine tuning paraphrase generation machine learning pre-training artificial intelligence neural networks |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c343t-f51f1d9c3460a94cb869fdfd8fc1483bbc57ca3103afe062fa29f69d51754c8a3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
PQID | 3034029480 |
PQPubID | 48839 |
PageCount | 8 |
ParticipantIDs | proquest_journals_3034029480 gale_infotracacademiconefile_A792276089 gale_incontextgauss_ISR_A792276089 crossref_primary_10_1007_s10559_024_00658_7 springer_journals_10_1007_s10559_024_00658_7 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-03-01 |
PublicationDateYYYYMMDD | 2024-03-01 |
PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | Cybernetics and systems analysis |
PublicationTitleAbbrev | Cybern Syst Anal |
PublicationYear | 2024 |
Publisher | Springer US Springer Springer Nature B.V |
Publisher_xml | – name: Springer US – name: Springer – name: Springer Nature B.V |
References | CR19 CR18 CR17 CR16 CR15 CR14 CR13 Han, Zhang, Ding (CR1) 2021; 2 CR12 CR11 CR10 CR2 CR4 CR3 CR6 CR5 CR8 CR7 CR9 CR27 CR26 CR25 CR24 CR23 CR22 CR21 CR20 658_CR7 658_CR20 658_CR8 658_CR5 658_CR6 658_CR24 658_CR23 658_CR9 658_CR22 658_CR21 658_CR27 658_CR26 658_CR25 658_CR13 658_CR12 658_CR11 658_CR10 658_CR17 658_CR16 658_CR15 658_CR14 658_CR19 658_CR18 658_CR3 X Han (658_CR1) 2021; 2 658_CR4 658_CR2 |
References_xml | – ident: CR22 – ident: CR18 – volume: 2 start-page: 225 year: 2021 end-page: 250 ident: CR1 article-title: Pre-trained models: Past, present and future publication-title: AI Open doi: 10.1016/j.aiopen.2021.08.002 – ident: CR4 – ident: CR14 – ident: CR2 – ident: CR16 – ident: CR12 – ident: CR10 – ident: CR6 – ident: CR8 – ident: CR25 – ident: CR27 – ident: CR23 – ident: CR21 – ident: CR19 – ident: CR3 – ident: CR15 – ident: CR17 – ident: CR13 – ident: CR11 – ident: CR9 – ident: CR5 – ident: CR7 – ident: CR26 – ident: CR24 – ident: CR20 – ident: 658_CR9 doi: 10.3115/1073083.1073135 – ident: 658_CR10 – ident: 658_CR18 doi: 10.48550/arXiv.1610.03098 – ident: 658_CR12 – ident: 658_CR14 – volume: 2 start-page: 225 year: 2021 ident: 658_CR1 publication-title: AI Open doi: 10.1016/j.aiopen.2021.08.002 – ident: 658_CR8 doi: 10.48550/arXiv.1702.03814 – ident: 658_CR19 doi: 10.1609/aaai.v33i01.33016834 – ident: 658_CR5 doi: 10.48550/arXiv.1711.05732 – ident: 658_CR13 doi: 10.18653/v1/W18-64019 – ident: 658_CR17 doi: 10.18653/v1/2021.eacl-main.180 – ident: 658_CR22 doi: 10.18653/v1/2021.acl-long.335 – ident: 658_CR24 doi: 10.18653/v1/2020.emnlp-main.55 – ident: 658_CR25 doi: 10.18653/v1/2020.acl-main.22 – ident: 658_CR15 doi: 10.1162/neco.1997.9.8.1735 – ident: 658_CR16 doi: 10.48550/arXiv.1706.03762 – ident: 658_CR2 doi: 10.48550/arXiv.1903.00138 – ident: 658_CR11 doi: 10.3115/1626355.1626389 – ident: 658_CR20 doi: 10.3115/v1/P15-2070 – ident: 658_CR6 doi: 10.48550/arXiv.2203.02155 – ident: 658_CR3 doi: 10.48550/arXiv.2005.12592 – ident: 658_CR4 doi: 10.48550/arXiv.2006.10369 – ident: 658_CR27 doi: 10.48550/arXiv.2011.14244 – ident: 658_CR26 doi: 10.18653/v1/2021.acl-long.112 – ident: 658_CR23 doi: 10.48550/arXiv.2001.01941 – ident: 658_CR21 doi: 10.18653/v1/2020.acl-main.703 – ident: 658_CR7 doi: 10.48550/arXiv.1405.0312 |
SSID | ssj0003078 |
Score | 2.2785783 |
Snippet | Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the “pre-training →... Paraphrase generation is a fundamental problem in natural language processing. Due to the significant success of transfer learning, the "pre-training [right... |
SourceID | proquest gale crossref springer |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 167 |
SubjectTerms | Artificial Intelligence Computational linguistics Control Cybernetics Language processing Large language models Machine learning Mathematics Mathematics and Statistics Natural language interfaces Natural language processing Neural networks Processor Architectures Software Engineering/Programming and Operating Systems Synthetic data Systems Theory |
Title | Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation |
URI | https://link.springer.com/article/10.1007/s10559-024-00658-7 https://www.proquest.com/docview/3034029480 |
Volume | 60 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7R7QUOQAuIhbKyEBIgcJWH49jHLXQpoK4q2pXKyXL8QAgpizbZA_31jBNnSwsceoziTBzP2PM5nvkG4IXx3mcul7R0wlKWGkZlZTk1uWYFwlNd2pA7fDznRwv26bw4j0lhzRDtPhxJdiv1H8luiH4p-hTa-U1absF2kQopRrA9_fD18-FmBUa77VPgeEKTXPKYLPNvKVcc0vVl-a_z0c7tzO7BYuhwH23yY3_dVvvm4hqX402_6D7cjTiUTHvD2YFbrt6FO8cbEtdmF3bivG_Iq0hO_foB6Fiw_vuFs-Rk5ehZrDFBlp4Epg8UOu9DyxuyrMnprxpFokDyXreaIEQmm_8Y5EQHuuwVOlLSvyFYyUNYzA7P3h3RWKYBFcrylvoi9amVeMETLZmpBJfeeiu8wb1WXlWmKI0O9cy0dwnPvM6k59IWiFyYETp_BKN6WbvHQMo004kVrBJeM3xYcmuES6ssd6lLUzmGN4Ou1M-ejUNd8i6H0VQ4mqobTVWO4XlQpwo0F3WIo_mm102jPp5-UdNSZlnJE4EiX8ZGftmutNExLQE7FJixrrTcG8xCxYneKEQAuAOXTCRjeDto-fL2_zv35GbNn8LtrDOUEP22B6N2tXbPEA611QStf3ZwMJ_EWTCBrUU2_Q1mMwIJ |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7B9gAcgBYQ2xawEBIgcJWH48THFbRsaXdV0V2pnCzHD4SQstUme6C_vuPE2dIChx6jOBPHM_Z8jme-AXijnXOJTQXNbWEoizWjojSc6lSxDOGpyo3PHZ5M-XjOvp5lZyEprO6j3fsjyXal_iPZDdEvRZ9CW79J87uwwXAPHg1gY_Tl-9H-egVGu-1S4HhEo1TwkCzzbynXHNLNZfmv89HW7Rw8gnnf4S7a5Nfeqin39MUNLsfbftFjeBhwKBl1hrMJd2y1BQ8maxLXegs2w7yvybtATv3-CahQsP7nhTXkZGnpLNSYIAtHPNMHCp12oeU1WVTk9HeFIlEg-awaRRAik_V_DHKiPF32Eh0p6d7greQpzA_2Z5_GNJRpQIWytKEui11sBF7wSAmmy4ILZ5wpnMa9VlqWOsu18vXMlLMRT5xKhOPCZIhcmC5U-gwG1aKyz4HkcaIiU7CycIrhw4IbXdi4TFIb2zgWQ_jQ60qed2wc8op32Y-mxNGU7WjKfAivvTqlp7mofBzND7Wqa3l4-k2OcpEkOY8KFPk2NHKLZqm0CmkJ2CHPjHWt5W5vFjJM9FoiAsAduGBFNISPvZavbv-_c9u3a_4K7o1nk2N5fDg92oH7SWs0PhJuFwbNcmVfIDRqypdhJlwCTyACgQ |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZKKyE4IFpAbCnUqiqVCqyNHcexjyvaVVvoakW7Um-W4wfikq026aH8esaJd_sADhyjOBPL39gztme-QWjfhhCYzxUpvXSEU8uJqpwgNje8APfUlC7mDp9PxMmMn10VV_ey-Lto9-WVZJ_TEFma6nZ47cLwXuIbeMIE7AvpbCgpn6ANWI5p1PQZG63WYtDgPhlOZCTLlUhpM3-X8cA0PV6g_7gp7QzQ-CV6kTxHPOqh3kRrvt5Cz89XtKvNFtpMM7XBHxOd9OErZFKJ-Z-_vMPThSeXqSoEngccuTlA6KQPBm_wvMYXtzWIBIH4yLQGg1OLVycPeGoiwfUCTB_u_xBxfY1m4-PLLyckFVYACHjeklDQQJ2CB5EZxW0lhQouOBks7I7yqrJFaU2sQGaCzwQLhqkglCvA1-BWmvwNWq_ntX-LcEmZyZzklQyGw8dKOCs9rVjuqadUDdCn5Zjq654_Q98xJUcENCCgOwR0OUB7cdh1JKaoY-TLD3PTNPr04rselYqxUmQSRB6kRmHeLow1KZEAOhS5rB603FnCp9PUbDTYbNgzKy6zAfq8hPTu9b87t_1_zXfR0-nRWH87nXx9h56xTs9i6NoOWm8XN_49-DJt9aFT19_5dOmy |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Specialized+Pre-Training+of+Neural+Networks+on+Synthetic+Data+for+Improving+Paraphrase+Generation&rft.jtitle=Cybernetics+and+systems+analysis&rft.au=Skurzhanskyi%2C+O.+H.&rft.au=Marchenko%2C+O.+O.&rft.au=Anisimov%2C+A.+V.&rft.date=2024-03-01&rft.pub=Springer+US&rft.issn=1060-0396&rft.eissn=1573-8337&rft.volume=60&rft.issue=2&rft.spage=167&rft.epage=174&rft_id=info:doi/10.1007%2Fs10559-024-00658-7&rft.externalDocID=10_1007_s10559_024_00658_7 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1060-0396&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1060-0396&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1060-0396&client=summon |