Data augmentation techniques in natural language processing

Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 132; p. 109803
Main Authors Pellicer, Lucas Francisco Amaral Orosco, Ferreira, Taynan Maier, Costa, Anna Helena Reali
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed for natural language processing (NLP) tasks have exhibited much slower advances and limited success in achieving performance gains. As a consequence, with the exception of applications of back-translation to machine translation tasks, these techniques have not been as thoroughly explored by the wider NLP community. Recent research on the subject still lacks a proper practical understanding of the relationship between the various existing DA methods. The connection between DA methods and several important aspects of its outputs, such as lexical diversity and semantic fidelity, is also still poorly understood. In this work, we perform a comprehensive study of NLP DA techniques, comparing their relative performance under different settings. We analyze the quality of the synthetic data generated, evaluate its performance gains and compare all of these aspects to previous existing DA procedures. •This article compares Data Augmentation techniques for texts.•This article demonstrates lexical diversity and semantic fidelity in techniques.•Back Translation Algorithms and Paraphrasers exhibit similar behavior.•LAMBADA Data Augmentation leads to greater diversity generation and low fidelity.•With more data, heavy algorithms do not pay off compare to light ones.
AbstractList Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep Learning and Machine Learning tasks. Despite its widespread and successful adoption within the computer vision community, DA techniques designed for natural language processing (NLP) tasks have exhibited much slower advances and limited success in achieving performance gains. As a consequence, with the exception of applications of back-translation to machine translation tasks, these techniques have not been as thoroughly explored by the wider NLP community. Recent research on the subject still lacks a proper practical understanding of the relationship between the various existing DA methods. The connection between DA methods and several important aspects of its outputs, such as lexical diversity and semantic fidelity, is also still poorly understood. In this work, we perform a comprehensive study of NLP DA techniques, comparing their relative performance under different settings. We analyze the quality of the synthetic data generated, evaluate its performance gains and compare all of these aspects to previous existing DA procedures. •This article compares Data Augmentation techniques for texts.•This article demonstrates lexical diversity and semantic fidelity in techniques.•Back Translation Algorithms and Paraphrasers exhibit similar behavior.•LAMBADA Data Augmentation leads to greater diversity generation and low fidelity.•With more data, heavy algorithms do not pay off compare to light ones.
ArticleNumber 109803
Author Costa, Anna Helena Reali
Ferreira, Taynan Maier
Pellicer, Lucas Francisco Amaral Orosco
Author_xml – sequence: 1
  givenname: Lucas Francisco Amaral Orosco
  orcidid: 0000-0003-2827-7602
  surname: Pellicer
  fullname: Pellicer, Lucas Francisco Amaral Orosco
  email: lucas.pellicer3394@gmail.com, lucas.pellicer@usp.br
– sequence: 2
  givenname: Taynan Maier
  surname: Ferreira
  fullname: Ferreira, Taynan Maier
– sequence: 3
  givenname: Anna Helena Reali
  surname: Costa
  fullname: Costa, Anna Helena Reali
BookMark eNp9j71OwzAUhT0UibbwAkx5gQQ7dh1bsKDyK1Vigdky9nVwlDrFdpB4e1KFiaHTlY7ud3S-FVqEIQBCVwRXBBN-3VU6DaaqcV1PgRSYLtCSbLgomWT8HK1S6vD0KGuxRDf3OutCj-0eQtbZD6HIYD6D_xohFT4UQecx6r7odWhH3UJxiIOBlHxoL9CZ032Cy7-7Ru-PD2_b53L3-vSyvduVhjKWSxDOCUY41o2kFBNwkkPTWMa4ZJgbLPGUUtE4a2vXWEkbuXGCmg-BgVtK10jMvSYOKUVwyvh5a47a94pgdRRXnTqKq6O4msUntP6HHqLf6_hzGrqdIZikvj1ElYyHYMD6CCYrO_hT-C81t3W_
CitedBy_id crossref_primary_10_1002_ange_202317978
crossref_primary_10_1162_coli_a_00520
crossref_primary_10_2478_ctra_2024_0002
crossref_primary_10_1016_j_eswa_2024_124603
crossref_primary_10_1016_j_ecoinf_2025_103101
crossref_primary_10_1016_j_oregeorev_2024_106396
crossref_primary_10_1007_s00521_024_10382_0
crossref_primary_10_1016_j_asoc_2024_112342
crossref_primary_10_1080_03081079_2025_2456960
crossref_primary_10_3390_informatics12010020
crossref_primary_10_3390_app14146055
crossref_primary_10_1007_s11042_024_18935_0
crossref_primary_10_1007_s13735_024_00345_5
crossref_primary_10_3390_app14209533
crossref_primary_10_1016_j_asoc_2023_110992
crossref_primary_10_36548_jiip_2024_3_005
crossref_primary_10_1016_j_asoc_2024_111301
crossref_primary_10_3390_info15020099
crossref_primary_10_1007_s12530_025_09671_3
crossref_primary_10_1016_j_energy_2023_129139
crossref_primary_10_1016_j_compag_2025_109896
crossref_primary_10_1016_j_ipm_2024_103977
crossref_primary_10_1016_j_procs_2024_10_208
crossref_primary_10_2478_jazcas_2023_0048
crossref_primary_10_1016_j_asoc_2024_111907
crossref_primary_10_1111_bjet_13570
crossref_primary_10_1016_j_jrmge_2024_09_010
crossref_primary_10_1002_anie_202317978
crossref_primary_10_3390_bdcc8120196
crossref_primary_10_1007_s13278_024_01201_4
crossref_primary_10_1016_j_eswa_2023_120908
crossref_primary_10_3390_info15050264
crossref_primary_10_3390_sym16091201
crossref_primary_10_1109_ACCESS_2024_3369918
crossref_primary_10_1016_j_neures_2024_06_003
crossref_primary_10_1007_s12559_024_10315_y
crossref_primary_10_1016_j_asoc_2024_111735
crossref_primary_10_1016_j_jksuci_2023_101572
Cites_doi 10.1186/s40537-019-0197-0
10.1109/IIPHDW.2018.8388338
10.1109/SSCI.2018.8628742
10.1109/CVPR.2016.90
10.1007/978-3-030-22747-0_7
10.1109/ACCESS.2019.2905015
10.3115/1072228.1072378
10.1609/aaai.v33i01.33016521
10.1145/3374217
10.1109/WACV.2019.00139
10.1109/ICASSP.2019.8682544
10.1109/ICIEA.2019.8833686
10.1016/j.neunet.2019.03.013
10.1109/LSP.2017.2657381
10.1109/CVPR.2019.00020
10.18653/v1/2020.emnlp-main.97
10.21437/Interspeech.2019-2293
10.1109/72.788640
ContentType Journal Article
Copyright 2022 The Authors
Copyright_xml – notice: 2022 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.asoc.2022.109803
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_asoc_2022_109803
S1568494622008523
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
23M
4.4
457
4G.
53G
5GY
5VS
6I.
6J9
7-5
71M
8P~
AABNK
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABWVN
ABXDB
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACRPL
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADNMO
ADTZH
AEBSH
AECPX
AEFWE
AEIPS
AEKER
AENEX
AFJKZ
AFTJW
AGHFR
AGUBO
AGYEJ
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNPGV
CS3
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SES
SEW
SPC
SPCBC
SSH
SST
SSV
SSZ
T5K
UHS
UNMZH
~G-
AAYWO
AAYXX
ACVFH
ADCNI
AEUPX
AFPUW
AFXIZ
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKYEP
APXCP
CITATION
ID FETCH-LOGICAL-c344t-e8ff84160a793301ef96e77d4469406c090301387fdd2f7d93795f83cb80e6d33
IEDL.DBID .~1
ISSN 1568-4946
IngestDate Thu Apr 24 22:56:12 EDT 2025
Tue Jul 01 01:50:17 EDT 2025
Sun Apr 06 06:53:42 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Back-translation
Data augmentation
Natural language processing
Machine learning
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c344t-e8ff84160a793301ef96e77d4469406c090301387fdd2f7d93795f83cb80e6d33
ORCID 0000-0003-2827-7602
OpenAccessLink https://www.sciencedirect.com/science/article/pii/S1568494622008523
ParticipantIDs crossref_citationtrail_10_1016_j_asoc_2022_109803
crossref_primary_10_1016_j_asoc_2022_109803
elsevier_sciencedirect_doi_10_1016_j_asoc_2022_109803
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate January 2023
2023-01-00
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: January 2023
PublicationDecade 2020
PublicationTitle Applied soft computing
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Wang, Yang (b56) 2015
Ganitkevitch, Van Durme, Callison-Burch (b62) 2013
F. Bao, M. Neumann, T. Vu, CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition, in: Proc. Interspeech 2019, 2019, pp. 2828–2832
Liu, Xu, Jia, Ma, Wang, Vosoughi (b72) 2020
Wen, Sun, Song, Gao, Wang, Xu (b8) 2020
J.T. Springenberg, A. Dosovitskiy, T. Brox, M.A. Riedmiller, Striving for Simplicity: The All Convolutional Net, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015, URL:.
Shorten, Khoshgoftaar (b2) 2019; 6
E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, AutoAugment: Learning Augmentation Policies from Data, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113–123
L. Gonog, Y. Zhou, A Review: Generative Adversarial Networks, in: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019, pp. 505–510.
Guo, Mao, Zhang (b42) 2019
Andreas (b43) 2020
Feng, Gangal, Wei, Chandar, Vosoughi, Mitamura, Hovy (b33) 2021
Kobayashi (b58) 2018
Wang, Pham, Dai, Neubig (b91) 2018
Basile, Bosco, Fersini, Nozza, Patti, Rangel Pardo, Rosso, Sanguinetti (b23) 2019
N. Ng, K. Cho, M. Ghassemi, SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness, in: Proc. of EMNLP, 2020, URL:.
D. Hendrycks, N. Mu, E.D. Cubuk, B. Zoph, J. Gilmer, B. Lakshminarayanan, AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, in: Proceedings of the International Conference on Learning Representations (ICLR), 2020.
R. Gupta, Data Augmentation for Low Resource Sentiment Analysis Using Generative Adversarial Networks, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7380–7384.
Şahin, Steedman (b60) 2018
Q. Xie, Z. Dai, E.H. Hovy, T. Luong, Q. Le, Unsupervised Data Augmentation for Consistency Training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, 2020, URL:.
Krizhevsky, Sutskever, Hinton (b11) 2012
Tokozume, Ushiku, Harada (b39) 2018
Hernández-García, König (b17) 2018
Harris, Marcu, Painter, Niranjan, Prügel-Bennett, Hare (b40) 2020
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013, URL:.
Sugiyama, Yoshinaga (b10) 2019
Edunov, Ott, Auli, Grangier (b18) 2018
Feng, Gangal, Kang, Mitamura, Hovy (b80) 2020
Zhao, Yu, Xu, Luo (b15) 2019; 115
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio (b34) 2014
X. Li, D. Roth, Learning Question Classifiers, in: COLING 2002: The 19th International Conference on Computational Linguistics, 2002, URL:.
Coulombe (b46) 2018
Vapnik (b26) 1999; 10
Schulman, Wolski, Dhariwal, Radford, Klimov (b73) 2017
Fellbaum (b53) 2005
Dao, Gu, Ratner, Smith, De Sa, Re (b14) 2019; vol. 97
Damodaran (b74) 2021
.
Zhang, Sheng, Alhazmi, Li (b51) 2020; 11
H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: Beyond Empirical Risk Minimization, in: International Conference on Learning Representations, 2018, URL:.
Jha, Lovering, Pavlick (b92) 2020
C. Summers, M.J. Dinneen, Improved Mixed-Example Data Augmentation, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1262–1270.
Wei, Zou (b54) 2019
Feng, Gangal, Wei, Chandar, Vosoughi, Mitamura, Hovy (b3) 2021
Ho, Liang, Chen, Stoica, Abbeel (b20) 2019; vol. 97
Bishop (b30) 2006
Qu, Shen, Shen, Sajeev, Han, Chen (b78) 2020
Ganitkevitch, Callison-Burch (b83) 2014
K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778
Marivate, Sefara (b77) 2019
Pavlick, Rastogi, Ganitkevitch, Van Durme, Callison-Burch (b82) 2015
Mallinson, Sennrich, Lapata (b61) 2017
L. Taylor, G. Nitschke, Improving Deep Learning with Generic Data Augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 1542–1547.
J.E. Hu, R. Rudinger, M. Post, B.V. Durme, PARABANK: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation, 33 (2019) 6521–6528.
Graça, Kim, Schamper, Khadivi, Ney (b63) 2019
Guo, Kim, Rush (b44) 2020
Morris, Lifland, Yoo, Qi (b50) 2020
Mohammad, Bravo-Marquez, Salameh, Kiritchenko (b22) 2018
A. Mikołajczyk, M. Grochowski, Data augmentation for improving deep learning in image classification problem, in: 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122.
Ma (b84) 2019
Wu, Lv, Zang, Han, Hu (b59) 2019
Cortis, Freitas, Daudert, Huerlimann, Zarrouk, Handschuh, Davis (b21) 2017
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: Y. Bengio, Y. LeCun (Eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014, URL:.
Kashefi, Hwa (b79) 2020
Salamon, Bello (b7) 2017; 24
Niu, Bansal (b76) 2019
Kobayashi (b9) 2018
Dopierre, Gravier, Logerais (b75) 2021
Sennrich, Haddow, Birch (b64) 2016
Dai, Adel (b70) 2020
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015, URL:.
Mitchell (b25) 1997
Imamura, Fujita, Sumita (b68) 2018
Konda, Bouthillier, Memisevic, Vincent (b16) 2015
Goodfellow, Bengio, Courville (b1) 2016
Xie, Wang, Li, Lévy, Nie, Jurafsky, Ng (b45) 2017
Ferreira, Costa (b24) 2020; vol. 12319
Murphy (b29) 2013
Chapelle, Weston, Bottou, Vapnik (b28) 2001
Pan, Yu, Yi, Khan, Yuan, Zheng (b36) 2019; 7
Zhang, Zhao, LeCun (b52) 2015
Bayer, Kaufhold, Reuter (b32) 2022
Zhang, Zhou, Miao, Li (b49) 2019
Fadaee, Bisazza, Monz (b55) 2017
Hoang, Koehn, Haffari, Cohn (b66) 2018
Chen, Dobriban, Lee (b31) 2019
Caswell, Chelba, Grangier (b67) 2019
Anaby-Tavor, Carmeli, Goldbraich, Kantor, Kour, Shlomov, Tepper, Zwerdling (b71) 2020
Yu, Dohan, Luong, Zhao, Chen, Norouzi, Le (b65) 2018
Tokozume, Ushiku, Harada (b38) 2018
Reimers, Gurevych (b89) 2020
Saravia, Liu, Huang, Wu, Chen (b87) 2018
Van Hee, Lefever, Hoste (b85) 2018
Krizhevsky (10.1016/j.asoc.2022.109803_b11) 2012
Feng (10.1016/j.asoc.2022.109803_b33) 2021
Caswell (10.1016/j.asoc.2022.109803_b67) 2019
Mitchell (10.1016/j.asoc.2022.109803_b25) 1997
Saravia (10.1016/j.asoc.2022.109803_b87) 2018
Basile (10.1016/j.asoc.2022.109803_b23) 2019
Salamon (10.1016/j.asoc.2022.109803_b7) 2017; 24
10.1016/j.asoc.2022.109803_b69
Shorten (10.1016/j.asoc.2022.109803_b2) 2019; 6
10.1016/j.asoc.2022.109803_b27
Zhang (10.1016/j.asoc.2022.109803_b51) 2020; 11
Guo (10.1016/j.asoc.2022.109803_b44) 2020
Zhang (10.1016/j.asoc.2022.109803_b52) 2015
Wang (10.1016/j.asoc.2022.109803_b91) 2018
Liu (10.1016/j.asoc.2022.109803_b72) 2020
Tokozume (10.1016/j.asoc.2022.109803_b39) 2018
Cortis (10.1016/j.asoc.2022.109803_b21) 2017
Bayer (10.1016/j.asoc.2022.109803_b32) 2022
Niu (10.1016/j.asoc.2022.109803_b76) 2019
Pavlick (10.1016/j.asoc.2022.109803_b82) 2015
Ganitkevitch (10.1016/j.asoc.2022.109803_b83) 2014
Graça (10.1016/j.asoc.2022.109803_b63) 2019
10.1016/j.asoc.2022.109803_b35
Ma (10.1016/j.asoc.2022.109803_b84) 2019
Hoang (10.1016/j.asoc.2022.109803_b66) 2018
Goodfellow (10.1016/j.asoc.2022.109803_b1) 2016
10.1016/j.asoc.2022.109803_b37
Sennrich (10.1016/j.asoc.2022.109803_b64) 2016
Mohammad (10.1016/j.asoc.2022.109803_b22) 2018
Harris (10.1016/j.asoc.2022.109803_b40) 2020
Guo (10.1016/j.asoc.2022.109803_b42) 2019
Chapelle (10.1016/j.asoc.2022.109803_b28) 2001
Damodaran (10.1016/j.asoc.2022.109803_b74) 2021
Konda (10.1016/j.asoc.2022.109803_b16) 2015
Zhang (10.1016/j.asoc.2022.109803_b49) 2019
Wu (10.1016/j.asoc.2022.109803_b59) 2019
10.1016/j.asoc.2022.109803_b6
Qu (10.1016/j.asoc.2022.109803_b78) 2020
Mallinson (10.1016/j.asoc.2022.109803_b61) 2017
10.1016/j.asoc.2022.109803_b4
Coulombe (10.1016/j.asoc.2022.109803_b46) 2018
10.1016/j.asoc.2022.109803_b5
Dao (10.1016/j.asoc.2022.109803_b14) 2019; vol. 97
Pan (10.1016/j.asoc.2022.109803_b36) 2019; 7
Ho (10.1016/j.asoc.2022.109803_b20) 2019; vol. 97
Fellbaum (10.1016/j.asoc.2022.109803_b53) 2005
Ganitkevitch (10.1016/j.asoc.2022.109803_b62) 2013
Reimers (10.1016/j.asoc.2022.109803_b89) 2020
10.1016/j.asoc.2022.109803_b81
Wen (10.1016/j.asoc.2022.109803_b8) 2020
Morris (10.1016/j.asoc.2022.109803_b50) 2020
Yu (10.1016/j.asoc.2022.109803_b65) 2018
10.1016/j.asoc.2022.109803_b41
Anaby-Tavor (10.1016/j.asoc.2022.109803_b71) 2020
Fadaee (10.1016/j.asoc.2022.109803_b55) 2017
10.1016/j.asoc.2022.109803_b86
Dopierre (10.1016/j.asoc.2022.109803_b75) 2021
10.1016/j.asoc.2022.109803_b88
10.1016/j.asoc.2022.109803_b47
10.1016/j.asoc.2022.109803_b48
Sugiyama (10.1016/j.asoc.2022.109803_b10) 2019
Hernández-García (10.1016/j.asoc.2022.109803_b17) 2018
Schulman (10.1016/j.asoc.2022.109803_b73) 2017
Goodfellow (10.1016/j.asoc.2022.109803_b34) 2014
Marivate (10.1016/j.asoc.2022.109803_b77) 2019
Wang (10.1016/j.asoc.2022.109803_b56) 2015
Bishop (10.1016/j.asoc.2022.109803_b30) 2006
Andreas (10.1016/j.asoc.2022.109803_b43) 2020
Vapnik (10.1016/j.asoc.2022.109803_b26) 1999; 10
Chen (10.1016/j.asoc.2022.109803_b31) 2019
10.1016/j.asoc.2022.109803_b90
Feng (10.1016/j.asoc.2022.109803_b80) 2020
Zhao (10.1016/j.asoc.2022.109803_b15) 2019; 115
Murphy (10.1016/j.asoc.2022.109803_b29) 2013
Wei (10.1016/j.asoc.2022.109803_b54) 2019
Kashefi (10.1016/j.asoc.2022.109803_b79) 2020
Kobayashi (10.1016/j.asoc.2022.109803_b9) 2018
Tokozume (10.1016/j.asoc.2022.109803_b38) 2018
Imamura (10.1016/j.asoc.2022.109803_b68) 2018
10.1016/j.asoc.2022.109803_b12
Xie (10.1016/j.asoc.2022.109803_b45) 2017
10.1016/j.asoc.2022.109803_b13
10.1016/j.asoc.2022.109803_b57
Şahin (10.1016/j.asoc.2022.109803_b60) 2018
Dai (10.1016/j.asoc.2022.109803_b70) 2020
Jha (10.1016/j.asoc.2022.109803_b92) 2020
10.1016/j.asoc.2022.109803_b19
Kobayashi (10.1016/j.asoc.2022.109803_b58) 2018
Feng (10.1016/j.asoc.2022.109803_b3) 2021
Edunov (10.1016/j.asoc.2022.109803_b18) 2018
Van Hee (10.1016/j.asoc.2022.109803_b85) 2018
Ferreira (10.1016/j.asoc.2022.109803_b24) 2020; vol. 12319
References_xml – start-page: 2557
  year: 2015
  end-page: 2563
  ident: b56
  article-title: That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets
  publication-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
– reference: Q. Xie, Z. Dai, E.H. Hovy, T. Luong, Q. Le, Unsupervised Data Augmentation for Consistency Training, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, 2020, URL:.
– volume: 115
  start-page: 82
  year: 2019
  end-page: 89
  ident: b15
  article-title: Equivalence between dropout and data augmentation: A mathematical check
  publication-title: Neural Netw.: Off. J. Int. Neural Netw. Soc.
– reference: C. Summers, M.J. Dinneen, Improved Mixed-Example Data Augmentation, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1262–1270.
– volume: vol. 97
  start-page: 1528
  year: 2019
  end-page: 1537
  ident: b14
  article-title: A kernel theory of modern data augmentation
  publication-title: Proceedings of the 36th International Conference on Machine Learning
– year: 2020
  ident: b8
  article-title: Time series data augmentation for deep learning: A survey
– year: 2015
  ident: b16
  article-title: Dropout as data augmentation
– reference: R. Gupta, Data Augmentation for Low Resource Sentiment Analysis Using Generative Adversarial Networks, in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7380–7384.
– year: 2019
  ident: b42
  article-title: Augmenting data with mixup for sentence classification: An empirical study
– volume: 6
  year: 2019
  ident: b2
  article-title: A survey on image data augmentation for deep learning
  publication-title: J. Big Data
– start-page: 1097
  year: 2012
  end-page: 1105
  ident: b11
  article-title: ImageNet classification with deep convolutional neural networks
  publication-title: Proceedings of the 25th International Conference on Neural Information Processing Systems - Vol. 1
– start-page: 425
  year: 2015
  end-page: 430
  ident: b82
  article-title: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
  publication-title: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
– start-page: 2672
  year: 2014
  end-page: 2680
  ident: b34
  article-title: Generative adversarial nets
  publication-title: Advances in Neural Information Processing Systems 27
– start-page: 53
  year: 2019
  end-page: 63
  ident: b67
  article-title: Tagged back-translation
  publication-title: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
– year: 2021
  ident: b3
  article-title: A survey of data augmentation approaches for NLP
– reference: H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, mixup: Beyond Empirical Risk Minimization, in: International Conference on Learning Representations, 2018, URL:.
– start-page: 452
  year: 2018
  end-page: 457
  ident: b58
  article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations
  publication-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
– volume: 11
  year: 2020
  ident: b51
  article-title: Adversarial attacks on deep-learning models in natural language processing: A survey
  publication-title: ACM Trans. Intell. Syst. Technol.
– start-page: 5004
  year: 2018
  end-page: 5009
  ident: b60
  article-title: Data augmentation via dependency tree morphing for low-resource languages
  publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
– reference: A. Mikołajczyk, M. Grochowski, Data augmentation for improving deep learning in image classification problem, in: 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122.
– year: 2018
  ident: b65
  article-title: QANet: Combining local convolution with global self-attention for reading comprehension
– year: 2020
  ident: b50
  article-title: TextAttack: A framework for adversarial attacks in natural language processing
– start-page: 2454
  year: 2021
  end-page: 2466
  ident: b75
  article-title: ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing
  publication-title: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021
– volume: vol. 12319
  start-page: 435
  year: 2020
  end-page: 449
  ident: b24
  article-title: Deepbt and NLP data augmentation techniques: A new proposal and a comprehensive study
  publication-title: Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20-23, 2020, Proceedings, Part I
– year: 2019
  ident: b84
  article-title: NLP augmentation
– year: 2017
  ident: b45
  article-title: Data noising as smoothing in neural network language models
  publication-title: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings
– start-page: 45
  year: 2019
  end-page: 52
  ident: b63
  article-title: Generalizing back-translation in neural machine translation
  publication-title: Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
– start-page: 18
  year: 2018
  end-page: 24
  ident: b66
  article-title: Iterative back-translation for neural machine translation
  publication-title: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
– start-page: 519
  year: 2017
  end-page: 535
  ident: b21
  article-title: SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
  publication-title: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
– reference: N. Ng, K. Cho, M. Ghassemi, SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness, in: Proc. of EMNLP, 2020, URL:.
– reference: F. Bao, M. Neumann, T. Vu, CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition, in: Proc. Interspeech 2019, 2019, pp. 2828–2832,
– volume: 10
  start-page: 988
  year: 1999
  end-page: 999
  ident: b26
  article-title: An overview of statistical learning theory
  publication-title: IEEE Trans. Neural Netw.
– reference: L. Taylor, G. Nitschke, Improving Deep Learning with Generic Data Augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 1542–1547.
– start-page: 968
  year: 2021
  end-page: 988
  ident: b33
  article-title: A survey of data augmentation approaches for NLP
  publication-title: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
– reference: K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778,
– reference: X. Li, D. Roth, Learning Question Classifiers, in: COLING 2002: The 19th International Conference on Computational Linguistics, 2002, URL:.
– year: 2022
  ident: b32
  article-title: A survey on data augmentation for text classification
  publication-title: ACM Comput. Surv.
– start-page: 84
  year: 2019
  end-page: 95
  ident: b59
  article-title: Conditional BERT contextual augmentation
  publication-title: Lecture Notes in Computer Science
– year: 2006
  ident: b30
  article-title: Pattern Recognition and Machine Learning (Information Science and Statistics)
– start-page: 665
  year: 2005
  end-page: 670
  ident: b53
  article-title: WordNet and wordnets
  publication-title: Encyclopedia of Language and Linguistics
– year: 2017
  ident: b73
  article-title: Proximal policy optimization algorithms
– start-page: 856
  year: 2018
  end-page: 861
  ident: b91
  article-title: SwitchOut: an efficient data augmentation algorithm for neural machine translation
  publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
– start-page: 567
  year: 2017
  end-page: 573
  ident: b55
  article-title: Data augmentation for low-resource neural machine translation
  publication-title: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
– year: 2017
  ident: b61
  article-title: Paraphrasing Revisited with Neural Machine Translation
– year: 2020
  ident: b80
  article-title: GenAug: Data augmentation for finetuning text generators
– volume: 7
  start-page: 36322
  year: 2019
  end-page: 36333
  ident: b36
  article-title: Recent progress on generative adversarial networks (GANs): A survey
  publication-title: IEEE Access
– start-page: 5547
  year: 2020
  end-page: 5552
  ident: b44
  article-title: Sequence-level mixed sample data augmentation
  publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
– start-page: 3861
  year: 2020
  end-page: 3867
  ident: b70
  article-title: An analysis of simple data augmentation for named entity recognition
  publication-title: Proceedings of the 28th International Conference on Computational Linguistics
– reference: E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, AutoAugment: Learning Augmentation Policies from Data, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113–123,
– start-page: 39
  year: 2018
  end-page: 50
  ident: b85
  article-title: SemEval-2018 task 3: Irony detection in english tweets
  publication-title: Proceedings of the 12th International Workshop on Semantic Evaluation
– reference: D. Hendrycks, N. Mu, E.D. Cubuk, B. Zoph, J. Gilmer, B. Lakshminarayanan, AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, in: Proceedings of the International Conference on Learning Representations (ICLR), 2020.
– start-page: 200
  year: 2020
  end-page: 208
  ident: b79
  article-title: Quantifying the evaluation of heuristic methods for textual data augmentation
  publication-title: Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-NUT 2020)
– start-page: 489
  year: 2018
  end-page: 500
  ident: b18
  article-title: Understanding back-translation at scale
  publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
– start-page: 95
  year: 2018
  end-page: 103
  ident: b17
  article-title: Further advantages of data augmentation on convolutional neural networks
  publication-title: Artificial Neural Networks and Machine Learning – ICANN 2018
– reference: C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: Y. Bengio, Y. LeCun (Eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014, URL:.
– year: 2020
  ident: b92
  article-title: Does data augmentation improve generalization in NLP?
– volume: 24
  start-page: 279
  year: 2017
  end-page: 283
  ident: b7
  article-title: Deep convolutional neural networks and data augmentation for environmental sound classification
  publication-title: IEEE Signal Process. Lett.
– start-page: 5564
  year: 2019
  end-page: 5569
  ident: b49
  article-title: Generating fluent adversarial examples for natural languages
  publication-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
– year: 2014
  ident: b83
  article-title: The multilingual paraphrase database
  publication-title: The 9th Edition of the Language Resources and Evaluation Conference
– start-page: 1317
  year: 2019
  end-page: 1323
  ident: b76
  article-title: Automatically learning data augmentation policies for dialogue tasks
  publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
– start-page: 416
  year: 2001
  end-page: 422
  ident: b28
  article-title: Vicinal risk minimization
  publication-title: Advances in Neural Information Processing Systems 13
– year: 2019
  ident: b31
  article-title: A group-theoretic framework for data augmentation
– start-page: 54
  year: 2019
  end-page: 63
  ident: b23
  article-title: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter
  publication-title: Proceedings of the 13th International Workshop on Semantic Evaluation
– start-page: 35
  year: 2019
  end-page: 44
  ident: b10
  article-title: Data augmentation using back-translation for context-aware neural machine translation
  publication-title: Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)
– year: 2019
  ident: b77
  article-title: Improving short text classification through global augmentation methods
– year: 2020
  ident: b78
  article-title: CoDA: Contrast-enhanced and diversity-promoting data augmentation for natural language understanding
– start-page: 452
  year: 2018
  end-page: 457
  ident: b9
  article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations
  publication-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
– start-page: 649
  year: 2015
  end-page: 657
  ident: b52
  article-title: Character-level convolutional networks for text classification
  publication-title: Advances in Neural Information Processing Systems, Vol. 28
– year: 2013
  ident: b29
  article-title: Machine Learning : A Probabilistic Perspective
– start-page: 758
  year: 2013
  end-page: 764
  ident: b62
  article-title: PPDB: The paraphrase database
  publication-title: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
– year: 2020
  ident: b89
  article-title: Making monolingual sentence embeddings multilingual using knowledge distillation
  publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
– reference: L. Gonog, Y. Zhou, A Review: Generative Adversarial Networks, in: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019, pp. 505–510.
– start-page: 7556
  year: 2020
  end-page: 7566
  ident: b43
  article-title: Good-enough compositional data augmentation
  publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
– reference: J.T. Springenberg, A. Dosovitskiy, T. Brox, M.A. Riedmiller, Striving for Simplicity: The All Convolutional Net, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015, URL:.
– year: 1997
  ident: b25
  article-title: Machine Learning
– year: 2020
  ident: b72
  article-title: Data boost: Text data augmentation through reinforcement learning guided conditional generation
  publication-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
– start-page: 1
  year: 2018
  end-page: 17
  ident: b22
  article-title: SemEval-2018 task 1: Affect in tweets
  publication-title: Proceedings of the 12th International Workshop on Semantic Evaluation
– reference: T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, in: Y. Bengio, Y. LeCun (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013, URL:.
– year: 2018
  ident: b38
  article-title: Learning from between-class examples for deep sound recognition
  publication-title: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings
– start-page: 55
  year: 2018
  end-page: 63
  ident: b68
  article-title: Enhancement of encoder and attention using target monolingual corpora in neural machine translation
  publication-title: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
– volume: vol. 97
  start-page: 2731
  year: 2019
  end-page: 2741
  ident: b20
  article-title: Population based augmentation: Efficient learning of augmentation policy schedules
  publication-title: Proceedings of the 36th International Conference on Machine Learning
– reference: .
– start-page: 86
  year: 2016
  end-page: 96
  ident: b64
  article-title: Improving neural machine translation models with monolingual data
  publication-title: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
– year: 2018
  ident: b46
  article-title: Text data augmentation made simple by leveraging NLP cloud APIs
– reference: J.E. Hu, R. Rudinger, M. Post, B.V. Durme, PARABANK: Monolingual bitext generation and sentential paraphrasing via lexically-constrained neural machine translation, 33 (2019) 6521–6528.
– start-page: 5486
  year: 2018
  end-page: 5494
  ident: b39
  article-title: Between-class learning for image classification
  publication-title: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018
– year: 2020
  ident: b40
  article-title: FMix: Enhancing mixed sample data augmentation
– year: 2016
  ident: b1
  article-title: Deep Learning
– start-page: 3687
  year: 2018
  end-page: 3697
  ident: b87
  article-title: CARER: Contextualized affect representations for emotion recognition
  publication-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
– start-page: 6382
  year: 2019
  end-page: 6388
  ident: b54
  article-title: EDA: Easy data augmentation techniques for boosting performance on text classification tasks
  publication-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
– start-page: 7383
  year: 2020
  end-page: 7390
  ident: b71
  article-title: Do not have enough data? Deep learning to the rescue!
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34
– reference: I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015, URL:.
– year: 2021
  ident: b74
  article-title: Parrot: Paraphrase generation for NLU
– year: 2022
  ident: 10.1016/j.asoc.2022.109803_b32
  article-title: A survey on data augmentation for text classification
  publication-title: ACM Comput. Surv.
– start-page: 7556
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b43
  article-title: Good-enough compositional data augmentation
– start-page: 7383
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b71
  article-title: Do not have enough data? Deep learning to the rescue!
– year: 2006
  ident: 10.1016/j.asoc.2022.109803_b30
– start-page: 45
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b63
  article-title: Generalizing back-translation in neural machine translation
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b92
– year: 2021
  ident: 10.1016/j.asoc.2022.109803_b74
– volume: 6
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b2
  article-title: A survey on image data augmentation for deep learning
  publication-title: J. Big Data
  doi: 10.1186/s40537-019-0197-0
– year: 2019
  ident: 10.1016/j.asoc.2022.109803_b42
– year: 2021
  ident: 10.1016/j.asoc.2022.109803_b3
– ident: 10.1016/j.asoc.2022.109803_b48
– year: 2013
  ident: 10.1016/j.asoc.2022.109803_b29
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b80
– ident: 10.1016/j.asoc.2022.109803_b12
– start-page: 452
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b9
  article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations
– start-page: 200
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b79
  article-title: Quantifying the evaluation of heuristic methods for textual data augmentation
– ident: 10.1016/j.asoc.2022.109803_b5
  doi: 10.1109/IIPHDW.2018.8388338
– start-page: 2672
  year: 2014
  ident: 10.1016/j.asoc.2022.109803_b34
  article-title: Generative adversarial nets
– ident: 10.1016/j.asoc.2022.109803_b57
– ident: 10.1016/j.asoc.2022.109803_b4
  doi: 10.1109/SSCI.2018.8628742
– ident: 10.1016/j.asoc.2022.109803_b13
  doi: 10.1109/CVPR.2016.90
– year: 2017
  ident: 10.1016/j.asoc.2022.109803_b45
  article-title: Data noising as smoothing in neural network language models
– start-page: 1
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b22
  article-title: SemEval-2018 task 1: Affect in tweets
– start-page: 84
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b59
  article-title: Conditional BERT contextual augmentation
  doi: 10.1007/978-3-030-22747-0_7
– start-page: 55
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b68
  article-title: Enhancement of encoder and attention using target monolingual corpora in neural machine translation
– start-page: 95
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b17
  article-title: Further advantages of data augmentation on convolutional neural networks
– start-page: 35
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b10
  article-title: Data augmentation using back-translation for context-aware neural machine translation
– ident: 10.1016/j.asoc.2022.109803_b47
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b78
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b40
– start-page: 5486
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b39
  article-title: Between-class learning for image classification
– start-page: 5564
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b49
  article-title: Generating fluent adversarial examples for natural languages
– start-page: 425
  year: 2015
  ident: 10.1016/j.asoc.2022.109803_b82
  article-title: PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
– volume: 7
  start-page: 36322
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b36
  article-title: Recent progress on generative adversarial networks (GANs): A survey
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2905015
– start-page: 6382
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b54
  article-title: EDA: Easy data augmentation techniques for boosting performance on text classification tasks
– ident: 10.1016/j.asoc.2022.109803_b86
  doi: 10.3115/1072228.1072378
– start-page: 489
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b18
  article-title: Understanding back-translation at scale
– year: 2018
  ident: 10.1016/j.asoc.2022.109803_b65
– start-page: 1317
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b76
  article-title: Automatically learning data augmentation policies for dialogue tasks
– ident: 10.1016/j.asoc.2022.109803_b88
  doi: 10.1609/aaai.v33i01.33016521
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b72
  article-title: Data boost: Text data augmentation through reinforcement learning guided conditional generation
– year: 2019
  ident: 10.1016/j.asoc.2022.109803_b77
– volume: vol. 97
  start-page: 2731
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b20
  article-title: Population based augmentation: Efficient learning of augmentation policy schedules
– start-page: 54
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b23
  article-title: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter
– volume: 11
  issue: 3
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b51
  article-title: Adversarial attacks on deep-learning models in natural language processing: A survey
  publication-title: ACM Trans. Intell. Syst. Technol.
  doi: 10.1145/3374217
– start-page: 86
  year: 2016
  ident: 10.1016/j.asoc.2022.109803_b64
  article-title: Improving neural machine translation models with monolingual data
– start-page: 39
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b85
  article-title: SemEval-2018 task 3: Irony detection in english tweets
– ident: 10.1016/j.asoc.2022.109803_b41
  doi: 10.1109/WACV.2019.00139
– start-page: 3861
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b70
  article-title: An analysis of simple data augmentation for named entity recognition
– ident: 10.1016/j.asoc.2022.109803_b37
  doi: 10.1109/ICASSP.2019.8682544
– ident: 10.1016/j.asoc.2022.109803_b35
  doi: 10.1109/ICIEA.2019.8833686
– start-page: 519
  year: 2017
  ident: 10.1016/j.asoc.2022.109803_b21
  article-title: SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
– year: 2019
  ident: 10.1016/j.asoc.2022.109803_b31
– start-page: 1097
  year: 2012
  ident: 10.1016/j.asoc.2022.109803_b11
  article-title: ImageNet classification with deep convolutional neural networks
– ident: 10.1016/j.asoc.2022.109803_b27
– start-page: 665
  year: 2005
  ident: 10.1016/j.asoc.2022.109803_b53
  article-title: WordNet and wordnets
– ident: 10.1016/j.asoc.2022.109803_b69
– start-page: 53
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b67
  article-title: Tagged back-translation
– start-page: 3687
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b87
  article-title: CARER: Contextualized affect representations for emotion recognition
– start-page: 758
  year: 2013
  ident: 10.1016/j.asoc.2022.109803_b62
  article-title: PPDB: The paraphrase database
– start-page: 2557
  year: 2015
  ident: 10.1016/j.asoc.2022.109803_b56
  article-title: That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets
– start-page: 567
  year: 2017
  ident: 10.1016/j.asoc.2022.109803_b55
  article-title: Data augmentation for low-resource neural machine translation
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b50
– volume: 115
  start-page: 82
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b15
  article-title: Equivalence between dropout and data augmentation: A mathematical check
  publication-title: Neural Netw.: Off. J. Int. Neural Netw. Soc.
  doi: 10.1016/j.neunet.2019.03.013
– volume: 24
  start-page: 279
  issue: 3
  year: 2017
  ident: 10.1016/j.asoc.2022.109803_b7
  article-title: Deep convolutional neural networks and data augmentation for environmental sound classification
  publication-title: IEEE Signal Process. Lett.
  doi: 10.1109/LSP.2017.2657381
– start-page: 5004
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b60
  article-title: Data augmentation via dependency tree morphing for low-resource languages
– ident: 10.1016/j.asoc.2022.109803_b19
  doi: 10.1109/CVPR.2019.00020
– year: 2019
  ident: 10.1016/j.asoc.2022.109803_b84
– start-page: 649
  year: 2015
  ident: 10.1016/j.asoc.2022.109803_b52
  article-title: Character-level convolutional networks for text classification
– year: 2017
  ident: 10.1016/j.asoc.2022.109803_b73
– volume: vol. 12319
  start-page: 435
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b24
  article-title: Deepbt and NLP data augmentation techniques: A new proposal and a comprehensive study
– start-page: 18
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b66
  article-title: Iterative back-translation for neural machine translation
– start-page: 452
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b58
  article-title: Contextual augmentation: Data augmentation by words with paradigmatic relations
– year: 2018
  ident: 10.1016/j.asoc.2022.109803_b38
  article-title: Learning from between-class examples for deep sound recognition
– year: 1997
  ident: 10.1016/j.asoc.2022.109803_b25
– start-page: 5547
  year: 2020
  ident: 10.1016/j.asoc.2022.109803_b44
  article-title: Sequence-level mixed sample data augmentation
– year: 2018
  ident: 10.1016/j.asoc.2022.109803_b46
– year: 2017
  ident: 10.1016/j.asoc.2022.109803_b61
– ident: 10.1016/j.asoc.2022.109803_b81
  doi: 10.18653/v1/2020.emnlp-main.97
– year: 2015
  ident: 10.1016/j.asoc.2022.109803_b16
– start-page: 856
  year: 2018
  ident: 10.1016/j.asoc.2022.109803_b91
  article-title: SwitchOut: an efficient data augmentation algorithm for neural machine translation
– year: 2016
  ident: 10.1016/j.asoc.2022.109803_b1
– volume: vol. 97
  start-page: 1528
  year: 2019
  ident: 10.1016/j.asoc.2022.109803_b14
  article-title: A kernel theory of modern data augmentation
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b89
  article-title: Making monolingual sentence embeddings multilingual using knowledge distillation
– ident: 10.1016/j.asoc.2022.109803_b6
  doi: 10.21437/Interspeech.2019-2293
– ident: 10.1016/j.asoc.2022.109803_b90
– volume: 10
  start-page: 988
  issue: 5
  year: 1999
  ident: 10.1016/j.asoc.2022.109803_b26
  article-title: An overview of statistical learning theory
  publication-title: IEEE Trans. Neural Netw.
  doi: 10.1109/72.788640
– start-page: 416
  year: 2001
  ident: 10.1016/j.asoc.2022.109803_b28
  article-title: Vicinal risk minimization
– year: 2020
  ident: 10.1016/j.asoc.2022.109803_b8
– start-page: 968
  year: 2021
  ident: 10.1016/j.asoc.2022.109803_b33
  article-title: A survey of data augmentation approaches for NLP
– start-page: 2454
  year: 2021
  ident: 10.1016/j.asoc.2022.109803_b75
  article-title: ProtAugment: Intent detection meta-learning through unsupervised diverse paraphrasing
– year: 2014
  ident: 10.1016/j.asoc.2022.109803_b83
  article-title: The multilingual paraphrase database
SSID ssj0016928
Score 2.5625918
Snippet Data Augmentation (DA) methods – a family of techniques designed for synthetic generation of training data – have shown remarkable results in various Deep...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 109803
SubjectTerms Back-translation
Data augmentation
Machine learning
Natural language processing
Title Data augmentation techniques in natural language processing
URI https://dx.doi.org/10.1016/j.asoc.2022.109803
Volume 132
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV05T8MwFH6qYGHhRpSj8sCGQpvETmwxVYWqXBUCKnWLXB-oCEIF6cpvxy9xKmDowBQp8ouiz_a7D4CT1HZk5DY3EDxxBoo1JuChlIFWMVOca8sYFgrfDZPBiF6P2bgBvboWBtMqPe-veHrJrf2btkezPZtO24_O8uBU0CTCEL6zp7CCnaZ4ys--FmkeYSLK-aq4OMDVvnCmyvGSDgFnI0YRdlXi9eCsv8Lph8Dpb8K61xRJt_qZLWiYfBs26ikMxF_KHTi_kIUkcv785suIcrJozPpJpjkpe3e6L9WuSTKrigOc0NqFUf_yqTcI_EiEQMWUFoHh1mKgsCNT9ESExorEpKl2Rp1wolmh1wVjj6nVOrKpdsqHYJbHasI7JtFxvAcr-Xtu9oGEsVZGJkwLFdKJsoJTFmknvy2lzNktTQhrLDLl-4Xj2IrXrE4Me8kQvwzxyyr8mnC6oJlV3TKWrmY1xNmvPc8cO19Cd_BPukNYw2HxlQPlCFaKj7k5dipFMWmVZ6YFq93ew-09Pq9uBsNvIIXLCg
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwELYqGGDhjShPDzCh0CaxE0eIAVGqlj4WWqlbcP1ARRAqmgqx8Kf4g5wTpwKGDkhdozhyPlt3_s539yF0Guoq92BxnYgFQFC0Ug5zOXek8KlgTGpKTaFwpxs0-uRuQAcl9FXUwpi0Smv7c5ueWWv7pGLRrIxHo8o9MA9GIhJ45gof-JTNrGypj3fgbZOrZg0W-czz6re9m4ZjpQUc4ROSOoppbS7cqjw0jN5VOgpUGEogRxG4OGGiF-YOL9RSejqU4MQjqpkvhqyqAmmioGD3lwmYCyObcPE5yytxgygTdDWzc8z0bKVOnlTGAXIgpZ5n2jixQqnrrzf84eHqG2jNHk3xdf73m6ikki20Xsg-YGsFttFljacc8-nji61bSvCsE-wEjxKcNQuFLxWxUDzOqxHAS-6g_kKA2kVLyWui9hB2fSkUD6iMhEuGQkeMUE_CgUETQoEolZFbYBEL26Dc6GQ8x0Um2lNs8IsNfnGOXxmdz8aM8_Ycc9-mBcTxr00Wg_-YM27_n-NO0Eqj12nH7Wa3dYBWjVJ9Hr05REvp21QdwXkmHR5n-wejh0Vv2G8QBQNS
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+augmentation+techniques+in+natural+language+processing&rft.jtitle=Applied+soft+computing&rft.au=Pellicer%2C+Lucas+Francisco+Amaral+Orosco&rft.au=Ferreira%2C+Taynan+Maier&rft.au=Costa%2C+Anna+Helena+Reali&rft.date=2023-01-01&rft.issn=1568-4946&rft.volume=132&rft.spage=109803&rft_id=info:doi/10.1016%2Fj.asoc.2022.109803&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_asoc_2022_109803
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1568-4946&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1568-4946&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1568-4946&client=summon