Data augmentation techniques in natural language processing

Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 132; p. 109803
Main Authors	Pellicer, Lucas Francisco Amaral Orosco, Ferreira, Taynan Maier, Costa, Anna Helena Reali
Format	Journal Article
Language	English
Published	Elsevier B.V 01.01.2023
Subjects	Back-translation Data augmentation Machine learning Natural language processing Back-translation Data augmentation Natural language processing Machine learning
Online Access	Get full text

Cover

Loading…

Abstract	Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed for natural language processing (NLP) tasks have exhibited much slower advances and limited success in achieving performance gains. As a consequence, with the exception of applications of back-translation to machine translation tasks, these techniques have not been as thoroughly explored by the wider NLP community. Recent research on the subject still lacks a proper practical understanding of the relationship between the various existing DA methods. The connection between DA methods and several important aspects of its outputs, such as lexical diversity and semantic fidelity, is also still poorly understood. In this work, we perform a comprehensive study of NLP DA techniques, comparing their relative performance under different settings. We analyze the quality of the synthetic data generated, evaluate its performance gains and compare all of these aspects to previous existing DA procedures. •This article compares Data Augmentation techniques for texts.•This article demonstrates lexical diversity and semantic fidelity in techniques.•Back Translation Algorithms and Paraphrasers exhibit similar behavior.•LAMBADA Data Augmentation leads to greater diversity generation and low fidelity.•With more data, heavy algorithms do not pay off compare to light ones.
AbstractList	Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed for natural language processing (NLP) tasks have exhibited much slower advances and limited success in achieving performance gains. As a consequence, with the exception of applications of back-translation to machine translation tasks, these techniques have not been as thoroughly explored by the wider NLP community. Recent research on the subject still lacks a proper practical understanding of the relationship between the various existing DA methods. The connection between DA methods and several important aspects of its outputs, such as lexical diversity and semantic fidelity, is also still poorly understood. In this work, we perform a comprehensive study of NLP DA techniques, comparing their relative performance under different settings. We analyze the quality of the synthetic data generated, evaluate its performance gains and compare all of these aspects to previous existing DA procedures. •This article compares Data Augmentation techniques for texts.•This article demonstrates lexical diversity and semantic fidelity in techniques.•Back Translation Algorithms and Paraphrasers exhibit similar behavior.•LAMBADA Data Augmentation leads to greater diversity generation and low fidelity.•With more data, heavy algorithms do not pay off compare to light ones.
ArticleNumber	109803
Author	Costa, Anna Helena Reali Ferreira, Taynan Maier Pellicer, Lucas Francisco Amaral Orosco
Author_xml	– sequence: 1 givenname: Lucas Francisco Amaral Orosco orcidid: 0000-0003-2827-7602 surname: Pellicer fullname: Pellicer, Lucas Francisco Amaral Orosco email: lucas.pellicer3394@gmail.com, lucas.pellicer@usp.br – sequence: 2 givenname: Taynan Maier surname: Ferreira fullname: Ferreira, Taynan Maier – sequence: 3 givenname: Anna Helena Reali surname: Costa fullname: Costa, Anna Helena Reali
BookMark	eNp9j71OwzAUhT0UibbwAkx5gQQ7dh1bsKDyK1Vigdky9nVwlDrFdpB4e1KFiaHTlY7ud3S-FVqEIQBCVwRXBBN-3VU6DaaqcV1PgRSYLtCSbLgomWT8HK1S6vD0KGuxRDf3OutCj-0eQtbZD6HIYD6D_xohFT4UQecx6r7odWhH3UJxiIOBlHxoL9CZ032Cy7-7Ru-PD2_b53L3-vSyvduVhjKWSxDOCUY41o2kFBNwkkPTWMa4ZJgbLPGUUtE4a2vXWEkbuXGCmg-BgVtK10jMvSYOKUVwyvh5a47a94pgdRRXnTqKq6O4msUntP6HHqLf6_hzGrqdIZikvj1ElYyHYMD6CCYrO_hT-C81t3W_
CitedBy_id	crossref_primary_10_1002_ange_202317978 crossref_primary_10_1162_coli_a_00520 crossref_primary_10_2478_ctra_2024_0002 crossref_primary_10_1016_j_eswa_2024_124603 crossref_primary_10_1016_j_ecoinf_2025_103101 crossref_primary_10_1016_j_oregeorev_2024_106396 crossref_primary_10_1007_s00521_024_10382_0 crossref_primary_10_1016_j_asoc_2024_112342 crossref_primary_10_1080_03081079_2025_2456960 crossref_primary_10_3390_informatics12010020 crossref_primary_10_3390_app14146055 crossref_primary_10_1007_s11042_024_18935_0 crossref_primary_10_1007_s13735_024_00345_5 crossref_primary_10_3390_app14209533 crossref_primary_10_1016_j_asoc_2023_110992 crossref_primary_10_36548_jiip_2024_3_005 crossref_primary_10_1016_j_asoc_2024_111301 crossref_primary_10_3390_info15020099 crossref_primary_10_1007_s12530_025_09671_3 crossref_primary_10_1016_j_energy_2023_129139 crossref_primary_10_1016_j_compag_2025_109896 crossref_primary_10_1016_j_ipm_2024_103977 crossref_primary_10_1016_j_procs_2024_10_208 crossref_primary_10_2478_jazcas_2023_0048 crossref_primary_10_1016_j_asoc_2024_111907 crossref_primary_10_1111_bjet_13570 crossref_primary_10_1016_j_jrmge_2024_09_010 crossref_primary_10_1002_anie_202317978 crossref_primary_10_3390_bdcc8120196 crossref_primary_10_1007_s13278_024_01201_4 crossref_primary_10_1016_j_eswa_2023_120908 crossref_primary_10_3390_info15050264 crossref_primary_10_3390_sym16091201 crossref_primary_10_1109_ACCESS_2024_3369918 crossref_primary_10_1016_j_neures_2024_06_003 crossref_primary_10_1007_s12559_024_10315_y crossref_primary_10_1016_j_asoc_2024_111735 crossref_primary_10_1016_j_jksuci_2023_101572
Cites_doi	10.1186/s40537-019-0197-0 10.1109/IIPHDW.2018.8388338 10.1109/SSCI.2018.8628742 10.1109/CVPR.2016.90 10.1007/978-3-030-22747-0_7 10.1109/ACCESS.2019.2905015 10.3115/1072228.1072378 10.1609/aaai.v33i01.33016521 10.1145/3374217 10.1109/WACV.2019.00139 10.1109/ICASSP.2019.8682544 10.1109/ICIEA.2019.8833686 10.1016/j.neunet.2019.03.013 10.1109/LSP.2017.2657381 10.1109/CVPR.2019.00020 10.18653/v1/2020.emnlp-main.97 10.21437/Interspeech.2019-2293 10.1109/72.788640
ContentType	Journal Article
Copyright	2022 The Authors
Copyright_xml	– notice: 2022 The Authors
DBID	6I. AAFTH AAYXX CITATION
DOI	10.1016/j.asoc.2022.109803
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
ExternalDocumentID	10_1016_j_asoc_2022_109803 S1568494622008523
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 23M 4.4 457 4G. 53G 5GY 5VS 6I. 6J9 7-5 71M 8P~ AABNK AACTN AAEDT AAEDW AAFTH AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABWVN ABXDB ACDAQ ACGFO ACGFS ACNNM ACRLP ACRPL ACZNC ADBBV ADEZE ADJOM ADMUD ADNMO ADTZH AEBSH AECPX AEFWE AEIPS AEKER AENEX AFJKZ AFTJW AGHFR AGUBO AGYEJ AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNPGV CS3 EBS EFJIC EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF HZ~ IHE J1W JJJVA KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SES SEW SPC SPCBC SSH SST SSV SSZ T5K UHS UNMZH ~G- AAYWO AAYXX ACVFH ADCNI AEUPX AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKYEP APXCP CITATION
ID	FETCH-LOGICAL-c344t-e8ff84160a793301ef96e77d4469406c090301387fdd2f7d93795f83cb80e6d33
IEDL.DBID	.~1
ISSN	1568-4946
IngestDate	Thu Apr 24 22:56:12 EDT 2025 Tue Jul 01 01:50:17 EDT 2025 Sun Apr 06 06:53:42 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Back-translation Data augmentation Natural language processing Machine learning
Language	English
License	This is an open access article under the CC BY-NC-ND license.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c344t-e8ff84160a793301ef96e77d4469406c090301387fdd2f7d93795f83cb80e6d33
ORCID	0000-0003-2827-7602
OpenAccessLink	https://www.sciencedirect.com/science/article/pii/S1568494622008523
ParticipantIDs	crossref_citationtrail_10_1016_j_asoc_2022_109803 crossref_primary_10_1016_j_asoc_2022_109803 elsevier_sciencedirect_doi_10_1016_j_asoc_2022_109803
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	January 2023 2023-01-00
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– month: 01 year: 2023 text: January 2023
PublicationDecade	2020
PublicationTitle	Applied soft computing
PublicationYear	2023
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Wang, Yang (b56) 2015 Ganitkevitch, Van Durme, Callison-Burch (b62) 2013 F. Bao, M. Neumann, T. Vu, CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition, in: Proc. Interspeech 2019, 2019, pp. 2828–2832 Liu, Xu, Jia, Ma, Wang, Vosoughi (b72) 2020 Wen, Sun, Song, Gao, Wang, Xu (b8) 2020 J.T. Springenberg, A. Dosovitskiy, T. Brox, M.A. Riedmiller, Striving for Simplicity: The All Convolutional Net, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015, URL:. Shorten, Khoshgoftaar (b2) 2019; 6 E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, AutoAugment: Learning Augmentation Policies from Data, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113–123 L. Gonog, Y. Zhou, A Review: Generative Adversarial Networks, in: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019, pp. 505–510. Guo, Mao, Zhang (b42) 2019 Andreas (b43) 2020 Feng, Gangal, Wei, Chandar, Vosoughi, Mitamura, Hovy (b33) 2021 Kobayashi (b58) 2018 Wang, Pham, Dai, Neubig (b91) 2018 Basile, Bosco, Fersini, Nozza, Patti, Rangel Pardo, Rosso, Sanguinetti (b23) 2019 N. Ng, K. Cho, M. Ghassemi, SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness, in: Proc. of EMNLP, 2020, URL:. D. Hendrycks, N. Mu, E.D. Cubuk, B. Zoph, J. Gilmer, B. Lakshminarayanan, AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, in: Proceedings of the International Conference on Learning Representations (ICLR), 2020. R. Gupta, Data Augmentation for Low Resource Sentiment Analysis Using Generative Adversarial Networks, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7380–7384. Şahin, Steedman (b60) 2018 Q. Xie, Z. Dai, E.H. Hovy, T. Luong, Q. Le, Unsupervised Data Augmentation for Consistency Training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, 2020, URL:. Krizhevsky, Sutskever, Hinton (b11) 2012 Tokozume, Ushiku, Harada (b39) 2018 Hernández-García, König (b17) 2018 Harris, Marcu, Painter, Niranjan, Prügel-Bennett, Hare (b40) 2020 T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013, URL:. Sugiyama, Yoshinaga (b10) 2019 Edunov, Ott, Auli, Grangier (b18) 2018 Feng, Gangal, Kang, Mitamura, Hovy (b80) 2020 Zhao, Yu, Xu, Luo (b15) 2019; 115 Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio (b34) 2014 X. Li, D. Roth, Learning Question Classifiers, in: COLING 2002: The 19th International Conference on Computational Linguistics, 2002, URL:. Coulombe (b46) 2018 Vapnik (b26) 1999; 10 Schulman, Wolski, Dhariwal, Radford, Klimov (b73) 2017 Fellbaum (b53) 2005 Dao, Gu, Ratner, Smith, De Sa, Re (b14) 2019; vol. 97 Damodaran (b74) 2021 . Zhang, Sheng, Alhazmi, Li (b51) 2020; 11 H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: Beyond Empirical Risk Minimization, in: International Conference on Learning Representations, 2018, URL:. Jha, Lovering, Pavlick (b92) 2020 C. Summers, M.J. Dinneen, Improved Mixed-Example Data Augmentation, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1262–1270. Wei, Zou (b54) 2019 Feng, Gangal, Wei, Chandar, Vosoughi, Mitamura, Hovy (b3) 2021 Ho, Liang, Chen, Stoica, Abbeel (b20) 2019; vol. 97 Bishop (b30) 2006 Qu, Shen, Shen, Sajeev, Han, Chen (b78) 2020 Ganitkevitch, Callison-Burch (b83) 2014 K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778 Marivate, Sefara (b77) 2019 Pavlick, Rastogi, Ganitkevitch, Van Durme, Callison-Burch (b82) 2015 Mallinson, Sennrich, Lapata (b61) 2017 L. Taylor, G. Nitschke, Improving Deep Learning with Generic Data Augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 1542–1547. J.E. Hu, R. Rudinger, M. Post, B.V. Durme, PARABANK: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation, 33 (2019) 6521–6528. Graça, Kim, Schamper, Khadivi, Ney (b63) 2019 Guo, Kim, Rush (b44) 2020 Morris, Lifland, Yoo, Qi (b50) 2020 Mohammad, Bravo-Marquez, Salameh, Kiritchenko (b22) 2018 A. Mikołajczyk, M. Grochowski, Data augmentation for improving deep learning in image classification problem, in: 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122. Ma (b84) 2019 Wu, Lv, Zang, Han, Hu (b59) 2019 Cortis, Freitas, Daudert, Huerlimann, Zarrouk, Handschuh, Davis (b21) 2017 C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: Y. Bengio, Y. LeCun (Eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014, URL:. Kashefi, Hwa (b79) 2020 Salamon, Bello (b7) 2017; 24 Niu, Bansal (b76) 2019 Kobayashi (b9) 2018 Dopierre, Gravier, Logerais (b75) 2021 Sennrich, Haddow, Birch (b64) 2016 Dai, Adel (b70) 2020 I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015, URL:. Mitchell (b25) 1997 Imamura, Fujita, Sumita (b68) 2018 Konda, Bouthillier, Memisevic, Vincent (b16) 2015 Goodfellow, Bengio, Courville (b1) 2016 Xie, Wang, Li, Lévy, Nie, Jurafsky, Ng (b45) 2017 Ferreira, Costa (b24) 2020; vol. 12319 Murphy (b29) 2013 Chapelle, Weston, Bottou, Vapnik (b28) 2001 Pan, Yu, Yi, Khan, Yuan, Zheng (b36) 2019; 7 Zhang, Zhao, LeCun (b52) 2015 Bayer, Kaufhold, Reuter (b32) 2022 Zhang, Zhou, Miao, Li (b49) 2019 Fadaee, Bisazza, Monz (b55) 2017 Hoang, Koehn, Haffari, Cohn (b66) 2018 Chen, Dobriban, Lee (b31) 2019 Caswell, Chelba, Grangier (b67) 2019 Anaby-Tavor, Carmeli, Goldbraich, Kantor, Kour, Shlomov, Tepper, Zwerdling (b71) 2020 Yu, Dohan, Luong, Zhao, Chen, Norouzi, Le (b65) 2018 Tokozume, Ushiku, Harada (b38) 2018 Reimers, Gurevych (b89) 2020 Saravia, Liu, Huang, Wu, Chen (b87) 2018 Van Hee, Lefever, Hoste (b85) 2018 Krizhevsky (10.1016/j.asoc.2022.109803_b11) 2012 Feng (10.1016/j.asoc.2022.109803_b33) 2021 Caswell (10.1016/j.asoc.2022.109803_b67) 2019 Mitchell (10.1016/j.asoc.2022.109803_b25) 1997 Saravia (10.1016/j.asoc.2022.109803_b87) 2018 Basile (10.1016/j.asoc.2022.109803_b23) 2019 Salamon (10.1016/j.asoc.2022.109803_b7) 2017; 24 10.1016/j.asoc.2022.109803_b69 Shorten (10.1016/j.asoc.2022.109803_b2) 2019; 6 10.1016/j.asoc.2022.109803_b27 Zhang (10.1016/j.asoc.2022.109803_b51) 2020; 11 Guo (10.1016/j.asoc.2022.109803_b44) 2020 Zhang (10.1016/j.asoc.2022.109803_b52) 2015 Wang (10.1016/j.asoc.2022.109803_b91) 2018 Liu (10.1016/j.asoc.2022.109803_b72) 2020 Tokozume (10.1016/j.asoc.2022.109803_b39) 2018 Cortis (10.1016/j.asoc.2022.109803_b21) 2017 Bayer (10.1016/j.asoc.2022.109803_b32) 2022 Niu (10.1016/j.asoc.2022.109803_b76) 2019 Pavlick (10.1016/j.asoc.2022.109803_b82) 2015 Ganitkevitch (10.1016/j.asoc.2022.109803_b83) 2014 Graça (10.1016/j.asoc.2022.109803_b63) 2019 10.1016/j.asoc.2022.109803_b35 Ma (10.1016/j.asoc.2022.109803_b84) 2019 Hoang (10.1016/j.asoc.2022.109803_b66) 2018 Goodfellow (10.1016/j.asoc.2022.109803_b1) 2016 10.1016/j.asoc.2022.109803_b37 Sennrich (10.1016/j.asoc.2022.109803_b64) 2016 Mohammad (10.1016/j.asoc.2022.109803_b22) 2018 Harris (10.1016/j.asoc.2022.109803_b40) 2020 Guo (10.1016/j.asoc.2022.109803_b42) 2019 Chapelle (10.1016/j.asoc.2022.109803_b28) 2001 Damodaran (10.1016/j.asoc.2022.109803_b74) 2021 Konda (10.1016/j.asoc.2022.109803_b16) 2015 Zhang (10.1016/j.asoc.2022.109803_b49) 2019 Wu (10.1016/j.asoc.2022.109803_b59) 2019 10.1016/j.asoc.2022.109803_b6 Qu (10.1016/j.asoc.2022.109803_b78) 2020 Mallinson (10.1016/j.asoc.2022.109803_b61) 2017 10.1016/j.asoc.2022.109803_b4 Coulombe (10.1016/j.asoc.2022.109803_b46) 2018 10.1016/j.asoc.2022.109803_b5 Dao (10.1016/j.asoc.2022.109803_b14) 2019; vol. 97 Pan (10.1016/j.asoc.2022.109803_b36) 2019; 7 Ho (10.1016/j.asoc.2022.109803_b20) 2019; vol. 97 Fellbaum (10.1016/j.asoc.2022.109803_b53) 2005 Ganitkevitch (10.1016/j.asoc.2022.109803_b62) 2013 Reimers (10.1016/j.asoc.2022.109803_b89) 2020 10.1016/j.asoc.2022.109803_b81 Wen (10.1016/j.asoc.2022.109803_b8) 2020 Morris (10.1016/j.asoc.2022.109803_b50) 2020 Yu (10.1016/j.asoc.2022.109803_b65) 2018 10.1016/j.asoc.2022.109803_b41 Anaby-Tavor (10.1016/j.asoc.2022.109803_b71) 2020 Fadaee (10.1016/j.asoc.2022.109803_b55) 2017 10.1016/j.asoc.2022.109803_b86 Dopierre (10.1016/j.asoc.2022.109803_b75) 2021 10.1016/j.asoc.2022.109803_b88 10.1016/j.asoc.2022.109803_b47 10.1016/j.asoc.2022.109803_b48 Sugiyama (10.1016/j.asoc.2022.109803_b10) 2019 Hernández-García (10.1016/j.asoc.2022.109803_b17) 2018 Schulman (10.1016/j.asoc.2022.109803_b73) 2017 Goodfellow (10.1016/j.asoc.2022.109803_b34) 2014 Marivate (10.1016/j.asoc.2022.109803_b77) 2019 Wang (10.1016/j.asoc.2022.109803_b56) 2015 Bishop (10.1016/j.asoc.2022.109803_b30) 2006 Andreas (10.1016/j.asoc.2022.109803_b43) 2020 Vapnik (10.1016/j.asoc.2022.109803_b26) 1999; 10 Chen (10.1016/j.asoc.2022.109803_b31) 2019 10.1016/j.asoc.2022.109803_b90 Feng (10.1016/j.asoc.2022.109803_b80) 2020 Zhao (10.1016/j.asoc.2022.109803_b15) 2019; 115 Murphy (10.1016/j.asoc.2022.109803_b29) 2013 Wei (10.1016/j.asoc.2022.109803_b54) 2019 Kashefi (10.1016/j.asoc.2022.109803_b79) 2020 Kobayashi (10.1016/j.asoc.2022.109803_b9) 2018 Tokozume (10.1016/j.asoc.2022.109803_b38) 2018 Imamura (10.1016/j.asoc.2022.109803_b68) 2018 10.1016/j.asoc.2022.109803_b12 Xie (10.1016/j.asoc.2022.109803_b45) 2017 10.1016/j.asoc.2022.109803_b13 10.1016/j.asoc.2022.109803_b57 Şahin (10.1016/j.asoc.2022.109803_b60) 2018 Dai (10.1016/j.asoc.2022.109803_b70) 2020 Jha (10.1016/j.asoc.2022.109803_b92) 2020 10.1016/j.asoc.2022.109803_b19 Kobayashi (10.1016/j.asoc.2022.109803_b58) 2018 Feng (10.1016/j.asoc.2022.109803_b3) 2021 Edunov (10.1016/j.asoc.2022.109803_b18) 2018 Van Hee (10.1016/j.asoc.2022.109803_b85) 2018 Ferreira (10.1016/j.asoc.2022.109803_b24) 2020; vol. 12319
References_xml	– start-page: 2557 year: 2015 end-page: 2563 ident: b56 article-title: That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets publication-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing – reference: Q. Xie, Z. Dai, E.H. Hovy, T. Luong, Q. Le, Unsupervised Data Augmentation for Consistency Training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, 2020, URL:. – volume: 115 start-page: 82 year: 2019 end-page: 89 ident: b15 article-title: Equivalence between dropout and data augmentation: A mathematical check publication-title: Neural Netw.: Off. J. Int. Neural Netw. Soc. – reference: C. Summers, M.J. Dinneen, Improved Mixed-Example Data Augmentation, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1262–1270. – volume: vol. 97 start-page: 1528 year: 2019 end-page: 1537 ident: b14 article-title: A kernel theory of modern data augmentation publication-title: Proceedings of the 36th International Conference on Machine Learning – year: 2020 ident: b8 article-title: Time series data augmentation for deep learning: A survey – year: 2015 ident: b16 article-title: Dropout as data augmentation – reference: R. Gupta, Data Augmentation for Low Resource Sentiment Analysis Using Generative Adversarial Networks, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7380–7384. – year: 2019 ident: b42 article-title: Augmenting data with mixup for sentence classification: An empirical study – volume: 6 year: 2019 ident: b2 article-title: A survey on image data augmentation for deep learning publication-title: J. Big Data – start-page: 1097 year: 2012 end-page: 1105 ident: b11 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proceedings of the 25th International Conference on Neural Information Processing Systems - Vol. 1 – start-page: 425 year: 2015 end-page: 430 ident: b82 article-title: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification publication-title: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) – start-page: 2672 year: 2014 end-page: 2680 ident: b34 article-title: Generative adversarial nets publication-title: Advances in Neural Information Processing Systems 27 – start-page: 53 year: 2019 end-page: 63 ident: b67 article-title: Tagged back-translation publication-title: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) – year: 2021 ident: b3 article-title: A survey of data augmentation approaches for NLP – reference: H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: Beyond Empirical Risk Minimization, in: International Conference on Learning Representations, 2018, URL:. – start-page: 452 year: 2018 end-page: 457 ident: b58 article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations publication-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) – volume: 11 year: 2020 ident: b51 article-title: Adversarial attacks on deep-learning models in natural language processing: A survey publication-title: ACM Trans. Intell. Syst. Technol. – start-page: 5004 year: 2018 end-page: 5009 ident: b60 article-title: Data augmentation via dependency tree morphing for low-resource languages publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing – reference: A. Mikołajczyk, M. Grochowski, Data augmentation for improving deep learning in image classification problem, in: 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122. – year: 2018 ident: b65 article-title: QANet: Combining local convolution with global self-attention for reading comprehension – year: 2020 ident: b50 article-title: TextAttack: A framework for adversarial attacks in natural language processing – start-page: 2454 year: 2021 end-page: 2466 ident: b75 article-title: ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing publication-title: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 – volume: vol. 12319 start-page: 435 year: 2020 end-page: 449 ident: b24 article-title: Deepbt and NLP data augmentation techniques: A new proposal and a comprehensive study publication-title: Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20-23, 2020, Proceedings, Part I – year: 2019 ident: b84 article-title: NLP augmentation – year: 2017 ident: b45 article-title: Data noising as smoothing in neural network language models publication-title: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings – start-page: 45 year: 2019 end-page: 52 ident: b63 article-title: Generalizing back-translation in neural machine translation publication-title: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) – start-page: 18 year: 2018 end-page: 24 ident: b66 article-title: Iterative back-translation for neural machine translation publication-title: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation – start-page: 519 year: 2017 end-page: 535 ident: b21 article-title: SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News publication-title: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) – reference: N. Ng, K. Cho, M. Ghassemi, SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness, in: Proc. of EMNLP, 2020, URL:. – reference: F. Bao, M. Neumann, T. Vu, CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition, in: Proc. Interspeech 2019, 2019, pp. 2828–2832, – volume: 10 start-page: 988 year: 1999 end-page: 999 ident: b26 article-title: An overview of statistical learning theory publication-title: IEEE Trans. Neural Netw. – reference: L. Taylor, G. Nitschke, Improving Deep Learning with Generic Data Augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 1542–1547. – start-page: 968 year: 2021 end-page: 988 ident: b33 article-title: A survey of data augmentation approaches for NLP publication-title: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 – reference: K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, – reference: X. Li, D. Roth, Learning Question Classifiers, in: COLING 2002: The 19th International Conference on Computational Linguistics, 2002, URL:. – year: 2022 ident: b32 article-title: A survey on data augmentation for text classification publication-title: ACM Comput. Surv. – start-page: 84 year: 2019 end-page: 95 ident: b59 article-title: Conditional BERT contextual augmentation publication-title: Lecture Notes in Computer Science – year: 2006 ident: b30 article-title: Pattern Recognition and Machine Learning (Information Science and Statistics) – start-page: 665 year: 2005 end-page: 670 ident: b53 article-title: WordNet and wordnets publication-title: Encyclopedia of Language and Linguistics – year: 2017 ident: b73 article-title: Proximal policy optimization algorithms – start-page: 856 year: 2018 end-page: 861 ident: b91 article-title: SwitchOut: an efficient data augmentation algorithm for neural machine translation publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing – start-page: 567 year: 2017 end-page: 573 ident: b55 article-title: Data augmentation for low-resource neural machine translation publication-title: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) – year: 2017 ident: b61 article-title: Paraphrasing Revisited with Neural Machine Translation – year: 2020 ident: b80 article-title: GenAug: Data augmentation for finetuning text generators – volume: 7 start-page: 36322 year: 2019 end-page: 36333 ident: b36 article-title: Recent progress on generative adversarial networks (GANs): A survey publication-title: IEEE Access – start-page: 5547 year: 2020 end-page: 5552 ident: b44 article-title: Sequence-level mixed sample data augmentation publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) – start-page: 3861 year: 2020 end-page: 3867 ident: b70 article-title: An analysis of simple data augmentation for named entity recognition publication-title: Proceedings of the 28th International Conference on Computational Linguistics – reference: E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, AutoAugment: Learning Augmentation Policies from Data, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113–123, – start-page: 39 year: 2018 end-page: 50 ident: b85 article-title: SemEval-2018 task 3: Irony detection in english tweets publication-title: Proceedings of the 12th International Workshop on Semantic Evaluation – reference: D. Hendrycks, N. Mu, E.D. Cubuk, B. Zoph, J. Gilmer, B. Lakshminarayanan, AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, in: Proceedings of the International Conference on Learning Representations (ICLR), 2020. – start-page: 200 year: 2020 end-page: 208 ident: b79 article-title: Quantifying the evaluation of heuristic methods for textual data augmentation publication-title: Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020) – start-page: 489 year: 2018 end-page: 500 ident: b18 article-title: Understanding back-translation at scale publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing – start-page: 95 year: 2018 end-page: 103 ident: b17 article-title: Further advantages of data augmentation on convolutional neural networks publication-title: Artificial Neural Networks and Machine Learning – ICANN 2018 – reference: C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: Y. Bengio, Y. LeCun (Eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014, URL:. – year: 2020 ident: b92 article-title: Does data augmentation improve generalization in NLP? – volume: 24 start-page: 279 year: 2017 end-page: 283 ident: b7 article-title: Deep convolutional neural networks and data augmentation for environmental sound classification publication-title: IEEE Signal Process. Lett. – start-page: 5564 year: 2019 end-page: 5569 ident: b49 article-title: Generating fluent adversarial examples for natural languages publication-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics – year: 2014 ident: b83 article-title: The multilingual paraphrase database publication-title: The 9th Edition of the Language Resources and Evaluation Conference – start-page: 1317 year: 2019 end-page: 1323 ident: b76 article-title: Automatically learning data augmentation policies for dialogue tasks publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) – start-page: 416 year: 2001 end-page: 422 ident: b28 article-title: Vicinal risk minimization publication-title: Advances in Neural Information Processing Systems 13 – year: 2019 ident: b31 article-title: A group-theoretic framework for data augmentation – start-page: 54 year: 2019 end-page: 63 ident: b23 article-title: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter publication-title: Proceedings of the 13th International Workshop on Semantic Evaluation – start-page: 35 year: 2019 end-page: 44 ident: b10 article-title: Data augmentation using back-translation for context-aware neural machine translation publication-title: Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019) – year: 2019 ident: b77 article-title: Improving short text classification through global augmentation methods – year: 2020 ident: b78 article-title: CoDA: Contrast-enhanced and diversity-promoting data augmentation for natural language understanding – start-page: 452 year: 2018 end-page: 457 ident: b9 article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations publication-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) – start-page: 649 year: 2015 end-page: 657 ident: b52 article-title: Character-level convolutional networks for text classification publication-title: Advances in Neural Information Processing Systems, Vol. 28 – year: 2013 ident: b29 article-title: Machine Learning : A Probabilistic Perspective – start-page: 758 year: 2013 end-page: 764 ident: b62 article-title: PPDB: The paraphrase database publication-title: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – year: 2020 ident: b89 article-title: Making monolingual sentence embeddings multilingual using knowledge distillation publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing – reference: L. Gonog, Y. Zhou, A Review: Generative Adversarial Networks, in: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019, pp. 505–510. – start-page: 7556 year: 2020 end-page: 7566 ident: b43 article-title: Good-enough compositional data augmentation publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics – reference: J.T. Springenberg, A. Dosovitskiy, T. Brox, M.A. Riedmiller, Striving for Simplicity: The All Convolutional Net, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015, URL:. – year: 1997 ident: b25 article-title: Machine Learning – year: 2020 ident: b72 article-title: Data boost: Text data augmentation through reinforcement learning guided conditional generation publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) – start-page: 1 year: 2018 end-page: 17 ident: b22 article-title: SemEval-2018 task 1: Affect in tweets publication-title: Proceedings of the 12th International Workshop on Semantic Evaluation – reference: T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013, URL:. – year: 2018 ident: b38 article-title: Learning from between-class examples for deep sound recognition publication-title: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings – start-page: 55 year: 2018 end-page: 63 ident: b68 article-title: Enhancement of encoder and attention using target monolingual corpora in neural machine translation publication-title: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation – volume: vol. 97 start-page: 2731 year: 2019 end-page: 2741 ident: b20 article-title: Population based augmentation: Efficient learning of augmentation policy schedules publication-title: Proceedings of the 36th International Conference on Machine Learning – reference: . – start-page: 86 year: 2016 end-page: 96 ident: b64 article-title: Improving neural machine translation models with monolingual data publication-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) – year: 2018 ident: b46 article-title: Text data augmentation made simple by leveraging NLP cloud APIs – reference: J.E. Hu, R. Rudinger, M. Post, B.V. Durme, PARABANK: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation, 33 (2019) 6521–6528. – start-page: 5486 year: 2018 end-page: 5494 ident: b39 article-title: Between-class learning for image classification publication-title: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 – year: 2020 ident: b40 article-title: FMix: Enhancing mixed sample data augmentation – year: 2016 ident: b1 article-title: Deep Learning – start-page: 3687 year: 2018 end-page: 3697 ident: b87 article-title: CARER: Contextualized affect representations for emotion recognition publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing – start-page: 6382 year: 2019 end-page: 6388 ident: b54 article-title: EDA: Easy data augmentation techniques for boosting performance on text classification tasks publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) – start-page: 7383 year: 2020 end-page: 7390 ident: b71 article-title: Do not have enough data? Deep learning to the rescue! publication-title: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34 – reference: I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015, URL:. – year: 2021 ident: b74 article-title: Parrot: Paraphrase generation for NLU – year: 2022 ident: 10.1016/j.asoc.2022.109803_b32 article-title: A survey on data augmentation for text classification publication-title: ACM Comput. Surv. – start-page: 7556 year: 2020 ident: 10.1016/j.asoc.2022.109803_b43 article-title: Good-enough compositional data augmentation – start-page: 7383 year: 2020 ident: 10.1016/j.asoc.2022.109803_b71 article-title: Do not have enough data? Deep learning to the rescue! – year: 2006 ident: 10.1016/j.asoc.2022.109803_b30 – start-page: 45 year: 2019 ident: 10.1016/j.asoc.2022.109803_b63 article-title: Generalizing back-translation in neural machine translation – year: 2020 ident: 10.1016/j.asoc.2022.109803_b92 – year: 2021 ident: 10.1016/j.asoc.2022.109803_b74 – volume: 6 year: 2019 ident: 10.1016/j.asoc.2022.109803_b2 article-title: A survey on image data augmentation for deep learning publication-title: J. Big Data doi: 10.1186/s40537-019-0197-0 – year: 2019 ident: 10.1016/j.asoc.2022.109803_b42 – year: 2021 ident: 10.1016/j.asoc.2022.109803_b3 – ident: 10.1016/j.asoc.2022.109803_b48 – year: 2013 ident: 10.1016/j.asoc.2022.109803_b29 – year: 2020 ident: 10.1016/j.asoc.2022.109803_b80 – ident: 10.1016/j.asoc.2022.109803_b12 – start-page: 452 year: 2018 ident: 10.1016/j.asoc.2022.109803_b9 article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations – start-page: 200 year: 2020 ident: 10.1016/j.asoc.2022.109803_b79 article-title: Quantifying the evaluation of heuristic methods for textual data augmentation – ident: 10.1016/j.asoc.2022.109803_b5 doi: 10.1109/IIPHDW.2018.8388338 – start-page: 2672 year: 2014 ident: 10.1016/j.asoc.2022.109803_b34 article-title: Generative adversarial nets – ident: 10.1016/j.asoc.2022.109803_b57 – ident: 10.1016/j.asoc.2022.109803_b4 doi: 10.1109/SSCI.2018.8628742 – ident: 10.1016/j.asoc.2022.109803_b13 doi: 10.1109/CVPR.2016.90 – year: 2017 ident: 10.1016/j.asoc.2022.109803_b45 article-title: Data noising as smoothing in neural network language models – start-page: 1 year: 2018 ident: 10.1016/j.asoc.2022.109803_b22 article-title: SemEval-2018 task 1: Affect in tweets – start-page: 84 year: 2019 ident: 10.1016/j.asoc.2022.109803_b59 article-title: Conditional BERT contextual augmentation doi: 10.1007/978-3-030-22747-0_7 – start-page: 55 year: 2018 ident: 10.1016/j.asoc.2022.109803_b68 article-title: Enhancement of encoder and attention using target monolingual corpora in neural machine translation – start-page: 95 year: 2018 ident: 10.1016/j.asoc.2022.109803_b17 article-title: Further advantages of data augmentation on convolutional neural networks – start-page: 35 year: 2019 ident: 10.1016/j.asoc.2022.109803_b10 article-title: Data augmentation using back-translation for context-aware neural machine translation – ident: 10.1016/j.asoc.2022.109803_b47 – year: 2020 ident: 10.1016/j.asoc.2022.109803_b78 – year: 2020 ident: 10.1016/j.asoc.2022.109803_b40 – start-page: 5486 year: 2018 ident: 10.1016/j.asoc.2022.109803_b39 article-title: Between-class learning for image classification – start-page: 5564 year: 2019 ident: 10.1016/j.asoc.2022.109803_b49 article-title: Generating fluent adversarial examples for natural languages – start-page: 425 year: 2015 ident: 10.1016/j.asoc.2022.109803_b82 article-title: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification – volume: 7 start-page: 36322 year: 2019 ident: 10.1016/j.asoc.2022.109803_b36 article-title: Recent progress on generative adversarial networks (GANs): A survey publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2905015 – start-page: 6382 year: 2019 ident: 10.1016/j.asoc.2022.109803_b54 article-title: EDA: Easy data augmentation techniques for boosting performance on text classification tasks – ident: 10.1016/j.asoc.2022.109803_b86 doi: 10.3115/1072228.1072378 – start-page: 489 year: 2018 ident: 10.1016/j.asoc.2022.109803_b18 article-title: Understanding back-translation at scale – year: 2018 ident: 10.1016/j.asoc.2022.109803_b65 – start-page: 1317 year: 2019 ident: 10.1016/j.asoc.2022.109803_b76 article-title: Automatically learning data augmentation policies for dialogue tasks – ident: 10.1016/j.asoc.2022.109803_b88 doi: 10.1609/aaai.v33i01.33016521 – year: 2020 ident: 10.1016/j.asoc.2022.109803_b72 article-title: Data boost: Text data augmentation through reinforcement learning guided conditional generation – year: 2019 ident: 10.1016/j.asoc.2022.109803_b77 – volume: vol. 97 start-page: 2731 year: 2019 ident: 10.1016/j.asoc.2022.109803_b20 article-title: Population based augmentation: Efficient learning of augmentation policy schedules – start-page: 54 year: 2019 ident: 10.1016/j.asoc.2022.109803_b23 article-title: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter – volume: 11 issue: 3 year: 2020 ident: 10.1016/j.asoc.2022.109803_b51 article-title: Adversarial attacks on deep-learning models in natural language processing: A survey publication-title: ACM Trans. Intell. Syst. Technol. doi: 10.1145/3374217 – start-page: 86 year: 2016 ident: 10.1016/j.asoc.2022.109803_b64 article-title: Improving neural machine translation models with monolingual data – start-page: 39 year: 2018 ident: 10.1016/j.asoc.2022.109803_b85 article-title: SemEval-2018 task 3: Irony detection in english tweets – ident: 10.1016/j.asoc.2022.109803_b41 doi: 10.1109/WACV.2019.00139 – start-page: 3861 year: 2020 ident: 10.1016/j.asoc.2022.109803_b70 article-title: An analysis of simple data augmentation for named entity recognition – ident: 10.1016/j.asoc.2022.109803_b37 doi: 10.1109/ICASSP.2019.8682544 – ident: 10.1016/j.asoc.2022.109803_b35 doi: 10.1109/ICIEA.2019.8833686 – start-page: 519 year: 2017 ident: 10.1016/j.asoc.2022.109803_b21 article-title: SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News – year: 2019 ident: 10.1016/j.asoc.2022.109803_b31 – start-page: 1097 year: 2012 ident: 10.1016/j.asoc.2022.109803_b11 article-title: ImageNet classification with deep convolutional neural networks – ident: 10.1016/j.asoc.2022.109803_b27 – start-page: 665 year: 2005 ident: 10.1016/j.asoc.2022.109803_b53 article-title: WordNet and wordnets – ident: 10.1016/j.asoc.2022.109803_b69 – start-page: 53 year: 2019 ident: 10.1016/j.asoc.2022.109803_b67 article-title: Tagged back-translation – start-page: 3687 year: 2018 ident: 10.1016/j.asoc.2022.109803_b87 article-title: CARER: Contextualized affect representations for emotion recognition – start-page: 758 year: 2013 ident: 10.1016/j.asoc.2022.109803_b62 article-title: PPDB: The paraphrase database – start-page: 2557 year: 2015 ident: 10.1016/j.asoc.2022.109803_b56 article-title: That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets – start-page: 567 year: 2017 ident: 10.1016/j.asoc.2022.109803_b55 article-title: Data augmentation for low-resource neural machine translation – year: 2020 ident: 10.1016/j.asoc.2022.109803_b50 – volume: 115 start-page: 82 year: 2019 ident: 10.1016/j.asoc.2022.109803_b15 article-title: Equivalence between dropout and data augmentation: A mathematical check publication-title: Neural Netw.: Off. J. Int. Neural Netw. Soc. doi: 10.1016/j.neunet.2019.03.013 – volume: 24 start-page: 279 issue: 3 year: 2017 ident: 10.1016/j.asoc.2022.109803_b7 article-title: Deep convolutional neural networks and data augmentation for environmental sound classification publication-title: IEEE Signal Process. Lett. doi: 10.1109/LSP.2017.2657381 – start-page: 5004 year: 2018 ident: 10.1016/j.asoc.2022.109803_b60 article-title: Data augmentation via dependency tree morphing for low-resource languages – ident: 10.1016/j.asoc.2022.109803_b19 doi: 10.1109/CVPR.2019.00020 – year: 2019 ident: 10.1016/j.asoc.2022.109803_b84 – start-page: 649 year: 2015 ident: 10.1016/j.asoc.2022.109803_b52 article-title: Character-level convolutional networks for text classification – year: 2017 ident: 10.1016/j.asoc.2022.109803_b73 – volume: vol. 12319 start-page: 435 year: 2020 ident: 10.1016/j.asoc.2022.109803_b24 article-title: Deepbt and NLP data augmentation techniques: A new proposal and a comprehensive study – start-page: 18 year: 2018 ident: 10.1016/j.asoc.2022.109803_b66 article-title: Iterative back-translation for neural machine translation – start-page: 452 year: 2018 ident: 10.1016/j.asoc.2022.109803_b58 article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations – year: 2018 ident: 10.1016/j.asoc.2022.109803_b38 article-title: Learning from between-class examples for deep sound recognition – year: 1997 ident: 10.1016/j.asoc.2022.109803_b25 – start-page: 5547 year: 2020 ident: 10.1016/j.asoc.2022.109803_b44 article-title: Sequence-level mixed sample data augmentation – year: 2018 ident: 10.1016/j.asoc.2022.109803_b46 – year: 2017 ident: 10.1016/j.asoc.2022.109803_b61 – ident: 10.1016/j.asoc.2022.109803_b81 doi: 10.18653/v1/2020.emnlp-main.97 – year: 2015 ident: 10.1016/j.asoc.2022.109803_b16 – start-page: 856 year: 2018 ident: 10.1016/j.asoc.2022.109803_b91 article-title: SwitchOut: an efficient data augmentation algorithm for neural machine translation – year: 2016 ident: 10.1016/j.asoc.2022.109803_b1 – volume: vol. 97 start-page: 1528 year: 2019 ident: 10.1016/j.asoc.2022.109803_b14 article-title: A kernel theory of modern data augmentation – year: 2020 ident: 10.1016/j.asoc.2022.109803_b89 article-title: Making monolingual sentence embeddings multilingual using knowledge distillation – ident: 10.1016/j.asoc.2022.109803_b6 doi: 10.21437/Interspeech.2019-2293 – ident: 10.1016/j.asoc.2022.109803_b90 – volume: 10 start-page: 988 issue: 5 year: 1999 ident: 10.1016/j.asoc.2022.109803_b26 article-title: An overview of statistical learning theory publication-title: IEEE Trans. Neural Netw. doi: 10.1109/72.788640 – start-page: 416 year: 2001 ident: 10.1016/j.asoc.2022.109803_b28 article-title: Vicinal risk minimization – year: 2020 ident: 10.1016/j.asoc.2022.109803_b8 – start-page: 968 year: 2021 ident: 10.1016/j.asoc.2022.109803_b33 article-title: A survey of data augmentation approaches for NLP – start-page: 2454 year: 2021 ident: 10.1016/j.asoc.2022.109803_b75 article-title: ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing – year: 2014 ident: 10.1016/j.asoc.2022.109803_b83 article-title: The multilingual paraphrase database
SSID	ssj0016928
Score	2.5625918
Snippet	Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	109803
SubjectTerms	Back-translation Data augmentation Machine learning Natural language processing
Title	Data augmentation techniques in natural language processing
URI	https://dx.doi.org/10.1016/j.asoc.2022.109803
Volume	132
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV05T8MwFH6qYGHhRpSj8sCGQpvETmwxVYWqXBUCKnWLXB-oCEIF6cpvxy9xKmDowBQp8ouiz_a7D4CT1HZk5DY3EDxxBoo1JuChlIFWMVOca8sYFgrfDZPBiF6P2bgBvboWBtMqPe-veHrJrf2btkezPZtO24_O8uBU0CTCEL6zp7CCnaZ4ys--FmkeYSLK-aq4OMDVvnCmyvGSDgFnI0YRdlXi9eCsv8Lph8Dpb8K61xRJt_qZLWiYfBs26ikMxF_KHTi_kIUkcv785suIcrJozPpJpjkpe3e6L9WuSTKrigOc0NqFUf_yqTcI_EiEQMWUFoHh1mKgsCNT9ESExorEpKl2Rp1wolmh1wVjj6nVOrKpdsqHYJbHasI7JtFxvAcr-Xtu9oGEsVZGJkwLFdKJsoJTFmknvy2lzNktTQhrLDLl-4Xj2IrXrE4Me8kQvwzxyyr8mnC6oJlV3TKWrmY1xNmvPc8cO19Cd_BPukNYw2HxlQPlCFaKj7k5dipFMWmVZ6YFq93ew-09Pq9uBsNvIIXLCg
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwELYqGGDhjShPDzCh0CaxE0eIAVGqlj4WWqlbcP1ARRAqmgqx8Kf4g5wTpwKGDkhdozhyPlt3_s539yF0Guoq92BxnYgFQFC0Ug5zOXek8KlgTGpKTaFwpxs0-uRuQAcl9FXUwpi0Smv7c5ueWWv7pGLRrIxHo8o9MA9GIhJ45gof-JTNrGypj3fgbZOrZg0W-czz6re9m4ZjpQUc4ROSOoppbS7cqjw0jN5VOgpUGEogRxG4OGGiF-YOL9RSejqU4MQjqpkvhqyqAmmioGD3lwmYCyObcPE5yytxgygTdDWzc8z0bKVOnlTGAXIgpZ5n2jixQqnrrzf84eHqG2jNHk3xdf73m6ikki20Xsg-YGsFttFljacc8-nji61bSvCsE-wEjxKcNQuFLxWxUDzOqxHAS-6g_kKA2kVLyWui9hB2fSkUD6iMhEuGQkeMUE_CgUETQoEolZFbYBEL26Dc6GQ8x0Um2lNs8IsNfnGOXxmdz8aM8_Ycc9-mBcTxr00Wg_-YM27_n-NO0Eqj12nH7Wa3dYBWjVJ9Hr05REvp21QdwXkmHR5n-wejh0Vv2G8QBQNS
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+augmentation+techniques+in+natural+language+processing&rft.jtitle=Applied+soft+computing&rft.au=Pellicer%2C+Lucas+Francisco+Amaral+Orosco&rft.au=Ferreira%2C+Taynan+Maier&rft.au=Costa%2C+Anna+Helena+Reali&rft.date=2023-01-01&rft.issn=1568-4946&rft.volume=132&rft.spage=109803&rft_id=info:doi/10.1016%2Fj.asoc.2022.109803&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_asoc_2022_109803
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1568-4946&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1568-4946&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1568-4946&client=summon