Comprehensive analysis of embeddings and pre-training in NLP
The amount of data and computing power has drastically increased over the last decade, which leads to the development of several new fronts in the field of Natural Language Processing (NLP). In addition to that, the entanglement of embeddings and large pre-trained models have pushed the field forwar...
Saved in:
Published in | Computer science review Vol. 42; p. 100433 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The amount of data and computing power has drastically increased over the last decade, which leads to the development of several new fronts in the field of Natural Language Processing (NLP). In addition to that, the entanglement of embeddings and large pre-trained models have pushed the field forward, covering a wide variety of tasks starting from machine translation to more complex tasks such as contextual text classification. This paper covers the underlying idea behind all embeddings and pre-trained models and provides an insight into fundamental strategies and implementation details of innovative embeddings. Further, it imparts the pros and cons of each specific embedding design and the associated impact on the result. It also comprehends the comparison of all the different strategies, datasets, architectures discussed in different papers with the help of standard metrics used in NLP. The content covered in this review work aims to shed light on different milestones reached in NLP, allowing the reader to deepen their understanding of NLP, which would motivate to explore the field further. |
---|---|
AbstractList | The amount of data and computing power has drastically increased over the last decade, which leads to the development of several new fronts in the field of Natural Language Processing (NLP). In addition to that, the entanglement of embeddings and large pre-trained models have pushed the field forward, covering a wide variety of tasks starting from machine translation to more complex tasks such as contextual text classification. This paper covers the underlying idea behind all embeddings and pre-trained models and provides an insight into fundamental strategies and implementation details of innovative embeddings. Further, it imparts the pros and cons of each specific embedding design and the associated impact on the result. It also comprehends the comparison of all the different strategies, datasets, architectures discussed in different papers with the help of standard metrics used in NLP. The content covered in this review work aims to shed light on different milestones reached in NLP, allowing the reader to deepen their understanding of NLP, which would motivate to explore the field further. |
ArticleNumber | 100433 |
Author | Namburu, Anupama Vijayakumar, Vaidehi R., Nandha Kumar Tripathy, Jatin Karthik S, Sudhakar Ilango P., Mangalraj Sethuraman, Sibi Chakkaravarthy Cruz, Meenalosini Vimal |
Author_xml | – sequence: 1 givenname: Jatin Karthik surname: Tripathy fullname: Tripathy, Jatin Karthik organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 2 givenname: Sibi Chakkaravarthy surname: Sethuraman fullname: Sethuraman, Sibi Chakkaravarthy email: sb.sibi@gmail.com organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 3 givenname: Meenalosini Vimal orcidid: 0000-0003-3164-4848 surname: Cruz fullname: Cruz, Meenalosini Vimal organization: Department of Information Technology, Georgia Southern University, GA, USA – sequence: 4 givenname: Anupama surname: Namburu fullname: Namburu, Anupama organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 5 givenname: Mangalraj surname: P. fullname: P., Mangalraj organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 6 givenname: Nandha Kumar surname: R. fullname: R., Nandha Kumar organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 7 givenname: Sudhakar Ilango surname: S fullname: S, Sudhakar Ilango organization: School of Computer Science and Engineering, VIT-AP University, Andhra Pradesh, India – sequence: 8 givenname: Vaidehi orcidid: 0000-0002-9524-5291 surname: Vijayakumar fullname: Vijayakumar, Vaidehi organization: Mother Teresa Women’s University, Kodaikanal, Tamilnadu, India |
BookMark | eNp9kN1KAzEQhYNUsK2-gRf7AlsnO7vJCiJI8Q-KeqHXITuZ1ZQ2W5JS6Nubsl57NcNhzmHONxOTMAQW4lrCQoJUN-sFDSnyYVFBJbMENeKZmMpWq1LrupnkvdF1CRL1hZiltAbQAI2airvlsN1F_uGQ_IELG-zmmHwqhr7gbcfO-fCdsuyKfFXuo_UhK4UPxdvq41Kc93aT-OpvzsXX0-Pn8qVcvT-_Lh9WJSGofanqW1JE1lUtOkRqHbOT1FjVQa-rHklrJ6saHbVKV7Jh4B6wwQ4lth3gXNRjLsUh5aK92UW_tfFoJJgTAbM2IwFzImBGAtl2P9o4_3bwHE0iz4HY-ci0N27w_wf8At3jZ-I |
CitedBy_id | crossref_primary_10_3390_technologies11050123 crossref_primary_10_48168_innosoft_s11_a88 crossref_primary_10_1016_j_eswa_2023_120439 crossref_primary_10_1016_j_techfore_2022_122306 crossref_primary_10_32604_csse_2023_036419 crossref_primary_10_3390_app122110765 crossref_primary_10_32604_iasc_2023_027848 |
Cites_doi | 10.3115/1073083.1073135 10.1109/ICCV.2015.11 10.1145/3331184.3331341 10.1016/j.neunet.2005.06.042 10.1145/3340531.3411908 10.3115/1289189.1289272 10.1145/1150402.1150464 10.1109/MSP.2012.2205597 10.3115/v1/D14-1162 10.3115/v1/P14-1023 10.1109/TKDE.2009.191 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 10.1038/s41586-019-1923-7 10.1109/TASL.2011.2134090 10.1109/CVPR.2016.90 10.1109/5.726791 10.1162/tacl_a_00179 10.1109/72.279181 |
ContentType | Journal Article |
Copyright | 2021 Elsevier Inc. |
Copyright_xml | – notice: 2021 Elsevier Inc. |
DBID | AAYXX CITATION |
DOI | 10.1016/j.cosrev.2021.100433 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1876-7745 |
ExternalDocumentID | 10_1016_j_cosrev_2021_100433 S1574013721000733 |
GroupedDBID | --K --M .~1 0R~ 1B1 1~. 1~5 4.4 457 4G. 5GY 5VS 6J9 7-5 71M 8P~ AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABBOA ABFRF ABJNI ABMAC ABUCO ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADMUD AEBSH AEFWE AEKER AFKWA AFTJW AGHFR AGUBO AGYEJ AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BKOJK BLXMC CS3 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FIRID FNPLU FYGXN GBLVA GBOLZ HAMUX HVGLF HZ~ IHE J1W KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 PC. Q38 RIG ROL RPZ SDF SDG SES SPC SPCBC SSB SSD SSV SSZ T5K UNMZH ~G- AAXKI AAYXX AFJKZ AKRWK CITATION |
ID | FETCH-LOGICAL-c306t-649c6ccad283d33c8deed1c5a6b0f72f3c77d1243dc867215e0ef0353b3138b03 |
IEDL.DBID | AIKHN |
ISSN | 1574-0137 |
IngestDate | Thu Sep 26 16:27:37 EDT 2024 Fri Feb 23 02:42:47 EST 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Keywords | Attention mechanism Embedding NLP Pre-training model Natural Language Processing |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c306t-649c6ccad283d33c8deed1c5a6b0f72f3c77d1243dc867215e0ef0353b3138b03 |
ORCID | 0000-0003-3164-4848 0000-0002-9524-5291 |
ParticipantIDs | crossref_primary_10_1016_j_cosrev_2021_100433 elsevier_sciencedirect_doi_10_1016_j_cosrev_2021_100433 |
PublicationCentury | 2000 |
PublicationDate | November 2021 2021-11-00 |
PublicationDateYYYYMMDD | 2021-11-01 |
PublicationDate_xml | – month: 11 year: 2021 text: November 2021 |
PublicationDecade | 2020 |
PublicationTitle | Computer science review |
PublicationYear | 2021 |
Publisher | Elsevier Inc |
Publisher_xml | – name: Elsevier Inc |
References | Peters, Neumann, Iyyer, Gardner, Clark, Lee, Zettlemoyer (b27) 2018 Bengio, Simard, Frasconi (b75) 1994; 5 Luong, Pham, Manning (b15) 2015 Lan, Chen, Goodman, Gimpel, Sharma, Soricut (b59) 2019 Dahl, Yu, Deng, Acero (b2) 2011; 20 Kaplan, McCandlish, Henighan, Brown, Chess, Child, Gray, Radford, Wu, Amodei (b47) 2020 He, Liu, Gao, Chen (b63) 2021 Weaver (b28) 1949 Sanh, Debut, Chaumond, Wolf (b56) 2019 Hinton, Vinyals, Dean (b58) 2015 McCann, Keskar, Xiong, Socher (b74) 2018 Lu, Keung, Ladhak, Bhardwaj, Zhang, Sun (b8) 2018 Graves, Schmidhuber (b11) 2005; 18 C. Buciluǎ, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 535–541. Alberti, Lee, Collins (b40) 2019 Mikolov, Chen, Corrado, Dean (b21) 2013 Liu, Lapata (b43) 2019 Deerwester, Dumais, Furnas, Landauer, Harshman (b25) 1990; 41 Chiu, Sainath, Wu, Prabhavalkar, Nguyen, Chen, Kannan, Weiss, Rao, Gonina (b78) 2018 Devlin, Chang, Lee, Toutanova (b48) 2018 C. Qu, L. Yang, M. Qiu, W.B. Croft, Y. Zhang, M. Iyyer, BERT with history answer embedding for conversational question answering, in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 1133–1136. J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al. Large scale distributed deep networks, in: Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, 2012, pp. 1223–1231. Clark, Luong, Le, Manning (b66) 2020 Zellers, Bisk, Schwartz, Choi (b72) 2018 Hinton, Deng, Yu, Dahl, Mohamed, Jaitly, Senior, Vanhoucke, Nguyen, Sainath (b1) 2012; 29 Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell (b46) 2020 S. Nagel, URL Rocktäschel, Grefenstette, Hermann, Kočiskỳ, Blunsom (b37) 2015 Sennrich, Haddow, Birch (b45) 2015 J. Pennington, R. Socher, C.D. Manning, Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b16) 2017 . Tang, Xu, Matsumoto, Ono (b76) 2016 Wang, Singh, Michael, Hill, Levy, Bowman (b73) 2018 F. (b82) 2020 Simonyan, Zisserman (b20) 2014 Liu, Saleh, Pot, Goodrich, Sepassi, Kaiser, Shazeer (b36) 2018 Moro, Raganato, Navigli (b29) 2014; 2 Hochreiter, Bengio, Frasconi, Schmidhuber (b32) 2001 Mangal, Modak, Joshi (b80) 2019 Sutskever, Vinyals, Le (b13) 2014 K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: A method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002, pp. 311–318. Zhang, Xu, Wang (b44) 2019 Taylor (b49) 1953; 30 Zhu, Zeng, Huang (b39) 2018 Trinh, Le (b52) 2018 C. Callison-Burch, M. Osborne, P. Koehn, Re-evaluating the role of BLEU in machine translation research, in: 11th Conference of the European Chapter of the Association for Computational Linguistics, 2006, pp. 249–256. Yang, Dai, Yang, Carbonell, Salakhutdinov, Le (b64) 2019; 32 Ba, Kiros, Hinton (b18) 2016 Rajpurkar, Zhang, Lopyrev, Liang (b70) 2016 Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis, Zettlemoyer, Stoyanov (b51) 2019 Hou, Huang, Shang, Jiang, Chen, Liu (b60) 2020 K. Papineni, S. Roukos, T. Ward, J. Henderson, F. Reeder, Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results, in: Proceedings of the Second International Conference on Human Language Technology Research, 2002, pp. 132–137. Gregor, Danihelka, Graves, Rezende, Wierstra (b10) 2015 Jawahar, Muller, Fethi, Martin, de la Clergerie, Sagot, Seddah (b30) 2018 Liu (b42) 2019 Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (b65) 2020 LeCun, Bottou, Bengio, Haffner (b5) 1998; 86 Agarap (b7) 2018 A. Gokaslan, V. Cohen, URL Huang, Xu, Yu (b9) 2015 K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. Fedus, Zoph, Shazeer (b62) 2021 M. Baroni, G. Dinu, G. Kruszewski, Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 238–247. Lee-Thorp, Ainslie, Eckstein, Ontanon (b67) 2021 Krizhevsky, Sutskever, Hinton (b4) 2012; 25 Cho, Van Merriënboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (b14) 2014 Radford, Wu, Child, Luan, Amodei, Sutskever (b38) 2019; 1 Radford, Narasimhan, Salimans, Sutskever (b33) 2018 Zhou, Dong, Xu, Xu (b79) 2018 Kotecha, Young (b81) 2018 Bahdanau, Cho, Bengio (b12) 2014 Senior, Evans, Jumper, Kirkpatrick, Sifre, Green, Qin, Žídek, Nelson, Bridgland (b6) 2020; 577 Pan, Yang (b19) 2009; 22 Mikolov, Sutskever, Chen, Corrado, Dean (b22) 2013 Hochreiter (b31) 1991; 91 Lai, Xie, Liu, Yang, Hovy (b71) 2017 Reimers, Gurevych (b55) 2019 Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, S. Fidler, Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 19–27. McCann, Bradbury, Xiong, Socher (b26) 2017 Wu, Schuster, Chen, Le, Norouzi, Macherey, Krikun, Cao, Gao, Macherey (b50) 2016 Harmon, Klabjan (b77) 2018 L. Yang, M. Zhang, C. Li, M. Bendersky, M. Najork, Beyond 512 tokens: Siamese multi-depth transformer-based hierarchical encoder for long-form document matching, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1725–1734. He (10.1016/j.cosrev.2021.100433_b63) 2021 Mikolov (10.1016/j.cosrev.2021.100433_b22) 2013 Chiu (10.1016/j.cosrev.2021.100433_b78) 2018 Lan (10.1016/j.cosrev.2021.100433_b59) 2019 Fedus (10.1016/j.cosrev.2021.100433_b62) 2021 Kaplan (10.1016/j.cosrev.2021.100433_b47) 2020 Liu (10.1016/j.cosrev.2021.100433_b42) 2019 Raffel (10.1016/j.cosrev.2021.100433_b65) 2020 Wang (10.1016/j.cosrev.2021.100433_b73) 2018 Rocktäschel (10.1016/j.cosrev.2021.100433_b37) 2015 Mangal (10.1016/j.cosrev.2021.100433_b80) 2019 Sennrich (10.1016/j.cosrev.2021.100433_b45) 2015 Zellers (10.1016/j.cosrev.2021.100433_b72) 2018 10.1016/j.cosrev.2021.100433_b17 Liu (10.1016/j.cosrev.2021.100433_b51) 2019 Deerwester (10.1016/j.cosrev.2021.100433_b25) 1990; 41 10.1016/j.cosrev.2021.100433_b54 10.1016/j.cosrev.2021.100433_b57 Sutskever (10.1016/j.cosrev.2021.100433_b13) 2014 McCann (10.1016/j.cosrev.2021.100433_b26) 2017 Bengio (10.1016/j.cosrev.2021.100433_b75) 1994; 5 10.1016/j.cosrev.2021.100433_b53 Hinton (10.1016/j.cosrev.2021.100433_b58) 2015 Liu (10.1016/j.cosrev.2021.100433_b36) 2018 Simonyan (10.1016/j.cosrev.2021.100433_b20) 2014 Hochreiter (10.1016/j.cosrev.2021.100433_b31) 1991; 91 Dahl (10.1016/j.cosrev.2021.100433_b2) 2011; 20 Pan (10.1016/j.cosrev.2021.100433_b19) 2009; 22 McCann (10.1016/j.cosrev.2021.100433_b74) 2018 Devlin (10.1016/j.cosrev.2021.100433_b48) 2018 Taylor (10.1016/j.cosrev.2021.100433_b49) 1953; 30 Peters (10.1016/j.cosrev.2021.100433_b27) 2018 Zhu (10.1016/j.cosrev.2021.100433_b39) 2018 Sanh (10.1016/j.cosrev.2021.100433_b56) 2019 Gregor (10.1016/j.cosrev.2021.100433_b10) 2015 Clark (10.1016/j.cosrev.2021.100433_b66) 2020 Hinton (10.1016/j.cosrev.2021.100433_b1) 2012; 29 10.1016/j.cosrev.2021.100433_b69 Hochreiter (10.1016/j.cosrev.2021.100433_b32) 2001 Wu (10.1016/j.cosrev.2021.100433_b50) 2016 10.1016/j.cosrev.2021.100433_b24 10.1016/j.cosrev.2021.100433_b68 10.1016/j.cosrev.2021.100433_b23 Agarap (10.1016/j.cosrev.2021.100433_b7) 2018 10.1016/j.cosrev.2021.100433_b61 Graves (10.1016/j.cosrev.2021.100433_b11) 2005; 18 10.1016/j.cosrev.2021.100433_b3 Brown (10.1016/j.cosrev.2021.100433_b46) 2020 Radford (10.1016/j.cosrev.2021.100433_b33) 2018 Luong (10.1016/j.cosrev.2021.100433_b15) 2015 Rajpurkar (10.1016/j.cosrev.2021.100433_b70) 2016 Lai (10.1016/j.cosrev.2021.100433_b71) 2017 Bahdanau (10.1016/j.cosrev.2021.100433_b12) 2014 LeCun (10.1016/j.cosrev.2021.100433_b5) 1998; 86 Alberti (10.1016/j.cosrev.2021.100433_b40) 2019 Reimers (10.1016/j.cosrev.2021.100433_b55) 2019 Hou (10.1016/j.cosrev.2021.100433_b60) 2020 Jawahar (10.1016/j.cosrev.2021.100433_b30) 2018 Lu (10.1016/j.cosrev.2021.100433_b8) 2018 Moro (10.1016/j.cosrev.2021.100433_b29) 2014; 2 10.1016/j.cosrev.2021.100433_b35 10.1016/j.cosrev.2021.100433_b34 Trinh (10.1016/j.cosrev.2021.100433_b52) 2018 Tang (10.1016/j.cosrev.2021.100433_b76) 2016 Krizhevsky (10.1016/j.cosrev.2021.100433_b4) 2012; 25 Lee-Thorp (10.1016/j.cosrev.2021.100433_b67) 2021 F. (10.1016/j.cosrev.2021.100433_b82) 2020 Huang (10.1016/j.cosrev.2021.100433_b9) 2015 Kotecha (10.1016/j.cosrev.2021.100433_b81) 2018 Zhang (10.1016/j.cosrev.2021.100433_b44) 2019 Cho (10.1016/j.cosrev.2021.100433_b14) 2014 Vaswani (10.1016/j.cosrev.2021.100433_b16) 2017 Liu (10.1016/j.cosrev.2021.100433_b43) 2019 Harmon (10.1016/j.cosrev.2021.100433_b77) 2018 Senior (10.1016/j.cosrev.2021.100433_b6) 2020; 577 Radford (10.1016/j.cosrev.2021.100433_b38) 2019; 1 Ba (10.1016/j.cosrev.2021.100433_b18) 2016 Mikolov (10.1016/j.cosrev.2021.100433_b21) 2013 Yang (10.1016/j.cosrev.2021.100433_b64) 2019; 32 Weaver (10.1016/j.cosrev.2021.100433_b28) 1949 Zhou (10.1016/j.cosrev.2021.100433_b79) 2018 10.1016/j.cosrev.2021.100433_b41 |
References_xml | – volume: 20 start-page: 30 year: 2011 end-page: 42 ident: b2 article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition publication-title: IEEE Trans. Audio Speech Lang. Process. contributor: fullname: Acero – year: 2021 ident: b63 article-title: DeBERTa: Decoding-enhanced BERT with disentangled attention contributor: fullname: Chen – year: 2019 ident: b43 article-title: Text summarization with pretrained encoders contributor: fullname: Lapata – start-page: 5998 year: 2017 end-page: 6008 ident: b16 article-title: Attention is all you need publication-title: Advances in Neural Information Processing Systems contributor: fullname: Polosukhin – year: 2018 ident: b8 article-title: A neural interlingua for multilingual machine translation contributor: fullname: Sun – year: 2018 ident: b7 article-title: Statistical analysis on E-commerce reviews, with sentiment classification using bidirectional recurrent neural network (RNN) contributor: fullname: Agarap – volume: 30 start-page: 415 year: 1953 end-page: 433 ident: b49 article-title: “Cloze procedure”: A new tool for measuring readability publication-title: J. Q. contributor: fullname: Taylor – volume: 32 year: 2019 ident: b64 article-title: Xlnet: Generalized autoregressive pretraining for language understanding publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Le – volume: 2 start-page: 231 year: 2014 end-page: 244 ident: b29 article-title: Entity linking meets word sense disambiguation: A unified approach publication-title: Trans. Assoc. Comput. Linguist. contributor: fullname: Navigli – year: 2018 ident: b48 article-title: Bert: Pre-training of deep bidirectional transformers for language understanding contributor: fullname: Toutanova – year: 2020 ident: b46 article-title: Language models are few-shot learners contributor: fullname: Askell – year: 2018 ident: b33 article-title: Improving language understanding by generative pre-training contributor: fullname: Sutskever – year: 2018 ident: b81 article-title: Generating music using an LSTM network contributor: fullname: Young – year: 2021 ident: b62 article-title: Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity contributor: fullname: Shazeer – year: 2016 ident: b50 article-title: Google’s neural machine translation system: Bridging the gap between human and machine translation contributor: fullname: Macherey – year: 2016 ident: b18 article-title: Layer normalization contributor: fullname: Hinton – volume: 91 year: 1991 ident: b31 article-title: Untersuchungen zu dynamischen neuronalen netzen publication-title: Diploma Tech. Univ. München contributor: fullname: Hochreiter – year: 2021 ident: b67 article-title: Fnet: Mixing tokens with Fourier transforms contributor: fullname: Ontanon – volume: 41 start-page: 391 year: 1990 end-page: 407 ident: b25 article-title: Indexing by latent semantic analysis publication-title: J. Am. Soc. Inf. Sci. contributor: fullname: Harshman – year: 2020 ident: b47 article-title: Scaling laws for neural language models contributor: fullname: Amodei – volume: 29 start-page: 82 year: 2012 end-page: 97 ident: b1 article-title: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups publication-title: IEEE Signal Process. Mag. contributor: fullname: Sainath – volume: 22 start-page: 1345 year: 2009 end-page: 1359 ident: b19 article-title: A survey on transfer learning publication-title: IEEE Trans. Knowl. Data Eng. contributor: fullname: Yang – year: 2019 ident: b56 article-title: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter contributor: fullname: Wolf – year: 2019 ident: b42 article-title: Fine-tune BERT for extractive summarization contributor: fullname: Liu – year: 2019 ident: b55 article-title: Sentence-bert: Sentence embeddings using siamese bert-networks contributor: fullname: Gurevych – year: 2019 ident: b51 article-title: Roberta: A robustly optimized bert pretraining approach contributor: fullname: Stoyanov – year: 2015 ident: b58 article-title: Distilling the knowledge in a neural network contributor: fullname: Dean – start-page: 4774 year: 2018 end-page: 4778 ident: b78 article-title: State-of-the-art speech recognition with sequence-to-sequence models publication-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing contributor: fullname: Gonina – year: 2015 ident: b15 article-title: Effective approaches to attention-based neural machine translation contributor: fullname: Manning – year: 2018 ident: b73 article-title: Glue: A multi-task benchmark and analysis platform for natural language understanding contributor: fullname: Bowman – volume: 86 start-page: 2278 year: 1998 end-page: 2324 ident: b5 article-title: Gradient-based learning applied to document recognition publication-title: Proc. IEEE contributor: fullname: Haffner – year: 2018 ident: b36 article-title: Generating wikipedia by summarizing long sequences contributor: fullname: Shazeer – volume: 18 start-page: 602 year: 2005 end-page: 610 ident: b11 article-title: Framewise phoneme classification with bidirectional LSTM and other neural network architectures publication-title: Neural Netw. contributor: fullname: Schmidhuber – volume: 5 start-page: 157 year: 1994 end-page: 166 ident: b75 article-title: Learning long-term dependencies with gradient descent is difficult publication-title: IEEE Trans. Neural Netw. contributor: fullname: Frasconi – year: 2018 ident: b77 article-title: Dynamic prediction length for time series with sequence to sequence networks contributor: fullname: Klabjan – volume: 1 start-page: 9 year: 2019 ident: b38 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog contributor: fullname: Sutskever – start-page: 503 year: 2016 end-page: 510 ident: b76 article-title: Sequence-to-sequence model with attention for time series classification publication-title: 2016 IEEE 16th International Conference on Data Mining Workshops contributor: fullname: Ono – year: 2018 ident: b72 article-title: Swag: A large-scale adversarial dataset for grounded commonsense inference contributor: fullname: Choi – year: 2020 ident: b65 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer contributor: fullname: Liu – year: 2014 ident: b14 article-title: Learning phrase representations using RNN encoder-decoder for statistical machine translation contributor: fullname: Bengio – year: 2015 ident: b45 article-title: Neural machine translation of rare words with subword units contributor: fullname: Birch – start-page: 1462 year: 2015 end-page: 1471 ident: b10 article-title: Draw: A recurrent neural network for image generation publication-title: International Conference on Machine Learning contributor: fullname: Wierstra – volume: 25 start-page: 1097 year: 2012 end-page: 1105 ident: b4 article-title: Imagenet classification with deep convolutional neural networks publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Hinton – year: 2019 ident: b80 article-title: Lstm based music generation system contributor: fullname: Joshi – year: 2019 ident: b40 article-title: A bert baseline for the natural questions contributor: fullname: Collins – year: 2017 ident: b26 article-title: Learned in translation: Contextualized word vectors contributor: fullname: Socher – year: 2018 ident: b52 article-title: A simple method for commonsense reasoning contributor: fullname: Le – start-page: 1 year: 2018 end-page: 16 ident: b30 article-title: ELMoLex: Connecting ELMo and lexicon features for dependency parsing publication-title: CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text To Universal Dependencies contributor: fullname: Seddah – year: 2014 ident: b12 article-title: Neural machine translation by jointly learning to align and translate contributor: fullname: Bengio – year: 2019 ident: b44 article-title: Pretraining-based natural language generation for text summarization contributor: fullname: Wang – year: 2014 ident: b20 article-title: Very deep convolutional networks for large-scale image recognition contributor: fullname: Zisserman – year: 2015 ident: b9 article-title: Bidirectional LSTM–CRF models for sequence tagging contributor: fullname: Yu – volume: 577 start-page: 706 year: 2020 end-page: 710 ident: b6 article-title: Improved protein structure prediction using potentials from deep learning publication-title: Nature contributor: fullname: Bridgland – year: 2018 ident: b74 article-title: The natural language decathlon: Multitask learning as question answering contributor: fullname: Socher – year: 2015 ident: b37 article-title: Reasoning about entailment with neural attention contributor: fullname: Blunsom – start-page: 3111 year: 2013 end-page: 3119 ident: b22 article-title: Distributed representations of words and phrases and their compositionality publication-title: Advances in Neural Information Processing Systems contributor: fullname: Dean – year: 1949 ident: b28 article-title: Translation publication-title: Machine Translation of Languages: Fourteen Essays contributor: fullname: Weaver – year: 2013 ident: b21 article-title: Efficient estimation of word representations in vector space contributor: fullname: Dean – year: 2019 ident: b59 article-title: Albert: A lite bert for self-supervised learning of language representations contributor: fullname: Soricut – year: 2020 ident: b82 article-title: Building a recurrent neural network - step by step - v1. [online] datascience-enthusiast.com. Available at: contributor: fullname: F. – year: 2020 ident: b66 article-title: Electra: Pre-training text encoders as discriminators rather than generators contributor: fullname: Manning – start-page: 210 year: 2018 end-page: 220 ident: b79 article-title: A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin chinese publication-title: International Conference on Neural Information Processing contributor: fullname: Xu – year: 2018 ident: b39 article-title: Sdnet: Contextualized attention-based deep network for conversational question answering contributor: fullname: Huang – year: 2001 ident: b32 article-title: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies contributor: fullname: Schmidhuber – year: 2017 ident: b71 article-title: Race: Large-scale reading comprehension dataset from examinations contributor: fullname: Hovy – start-page: 3104 year: 2014 end-page: 3112 ident: b13 article-title: Sequence to sequence learning with neural networks publication-title: Advances in Neural Information Processing Systems contributor: fullname: Le – year: 2018 ident: b27 article-title: Deep contextualized word representations contributor: fullname: Zettlemoyer – year: 2020 ident: b60 article-title: Dynabert: Dynamic bert with adaptive width and depth contributor: fullname: Liu – year: 2016 ident: b70 article-title: Squad: 100,000+ questions for machine comprehension of text contributor: fullname: Liang – ident: 10.1016/j.cosrev.2021.100433_b68 doi: 10.3115/1073083.1073135 – year: 2021 ident: 10.1016/j.cosrev.2021.100433_b63 contributor: fullname: He – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b72 contributor: fullname: Zellers – ident: 10.1016/j.cosrev.2021.100433_b34 doi: 10.1109/ICCV.2015.11 – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b81 contributor: fullname: Kotecha – volume: 25 start-page: 1097 year: 2012 ident: 10.1016/j.cosrev.2021.100433_b4 article-title: Imagenet classification with deep convolutional neural networks publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Krizhevsky – start-page: 3104 year: 2014 ident: 10.1016/j.cosrev.2021.100433_b13 article-title: Sequence to sequence learning with neural networks contributor: fullname: Sutskever – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b51 contributor: fullname: Liu – ident: 10.1016/j.cosrev.2021.100433_b3 – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b36 contributor: fullname: Liu – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b60 contributor: fullname: Hou – year: 2014 ident: 10.1016/j.cosrev.2021.100433_b14 contributor: fullname: Cho – year: 1949 ident: 10.1016/j.cosrev.2021.100433_b28 article-title: Translation contributor: fullname: Weaver – year: 2001 ident: 10.1016/j.cosrev.2021.100433_b32 contributor: fullname: Hochreiter – ident: 10.1016/j.cosrev.2021.100433_b69 – year: 2017 ident: 10.1016/j.cosrev.2021.100433_b26 contributor: fullname: McCann – volume: 30 start-page: 415 issue: 4 year: 1953 ident: 10.1016/j.cosrev.2021.100433_b49 article-title: “Cloze procedure”: A new tool for measuring readability publication-title: J. Q. contributor: fullname: Taylor – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b39 contributor: fullname: Zhu – year: 2014 ident: 10.1016/j.cosrev.2021.100433_b20 contributor: fullname: Simonyan – ident: 10.1016/j.cosrev.2021.100433_b41 doi: 10.1145/3331184.3331341 – ident: 10.1016/j.cosrev.2021.100433_b54 – volume: 18 start-page: 602 issue: 5–6 year: 2005 ident: 10.1016/j.cosrev.2021.100433_b11 article-title: Framewise phoneme classification with bidirectional LSTM and other neural network architectures publication-title: Neural Netw. doi: 10.1016/j.neunet.2005.06.042 contributor: fullname: Graves – year: 2017 ident: 10.1016/j.cosrev.2021.100433_b71 contributor: fullname: Lai – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b44 contributor: fullname: Zhang – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b27 contributor: fullname: Peters – ident: 10.1016/j.cosrev.2021.100433_b61 doi: 10.1145/3340531.3411908 – ident: 10.1016/j.cosrev.2021.100433_b35 doi: 10.3115/1289189.1289272 – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b77 contributor: fullname: Harmon – ident: 10.1016/j.cosrev.2021.100433_b57 doi: 10.1145/1150402.1150464 – year: 2016 ident: 10.1016/j.cosrev.2021.100433_b70 contributor: fullname: Rajpurkar – volume: 29 start-page: 82 issue: 6 year: 2012 ident: 10.1016/j.cosrev.2021.100433_b1 article-title: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups publication-title: IEEE Signal Process. Mag. doi: 10.1109/MSP.2012.2205597 contributor: fullname: Hinton – start-page: 3111 year: 2013 ident: 10.1016/j.cosrev.2021.100433_b22 article-title: Distributed representations of words and phrases and their compositionality contributor: fullname: Mikolov – volume: 32 year: 2019 ident: 10.1016/j.cosrev.2021.100433_b64 article-title: Xlnet: Generalized autoregressive pretraining for language understanding publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Yang – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b7 contributor: fullname: Agarap – year: 2015 ident: 10.1016/j.cosrev.2021.100433_b9 contributor: fullname: Huang – start-page: 1 year: 2018 ident: 10.1016/j.cosrev.2021.100433_b30 article-title: ELMoLex: Connecting ELMo and lexicon features for dependency parsing contributor: fullname: Jawahar – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b65 contributor: fullname: Raffel – year: 2013 ident: 10.1016/j.cosrev.2021.100433_b21 contributor: fullname: Mikolov – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b46 contributor: fullname: Brown – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b40 contributor: fullname: Alberti – ident: 10.1016/j.cosrev.2021.100433_b53 – ident: 10.1016/j.cosrev.2021.100433_b24 doi: 10.3115/v1/D14-1162 – start-page: 5998 year: 2017 ident: 10.1016/j.cosrev.2021.100433_b16 article-title: Attention is all you need contributor: fullname: Vaswani – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b43 contributor: fullname: Liu – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b80 contributor: fullname: Mangal – volume: 1 start-page: 9 issue: 8 year: 2019 ident: 10.1016/j.cosrev.2021.100433_b38 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog contributor: fullname: Radford – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b42 contributor: fullname: Liu – ident: 10.1016/j.cosrev.2021.100433_b23 doi: 10.3115/v1/P14-1023 – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b56 contributor: fullname: Sanh – start-page: 210 year: 2018 ident: 10.1016/j.cosrev.2021.100433_b79 article-title: A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin chinese contributor: fullname: Zhou – volume: 22 start-page: 1345 issue: 10 year: 2009 ident: 10.1016/j.cosrev.2021.100433_b19 article-title: A survey on transfer learning publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2009.191 contributor: fullname: Pan – start-page: 4774 year: 2018 ident: 10.1016/j.cosrev.2021.100433_b78 article-title: State-of-the-art speech recognition with sequence-to-sequence models contributor: fullname: Chiu – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b55 contributor: fullname: Reimers – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b74 contributor: fullname: McCann – volume: 41 start-page: 391 issue: 6 year: 1990 ident: 10.1016/j.cosrev.2021.100433_b25 article-title: Indexing by latent semantic analysis publication-title: J. Am. Soc. Inf. Sci. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 contributor: fullname: Deerwester – year: 2021 ident: 10.1016/j.cosrev.2021.100433_b67 contributor: fullname: Lee-Thorp – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b82 contributor: fullname: F. – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b47 contributor: fullname: Kaplan – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b33 contributor: fullname: Radford – volume: 577 start-page: 706 issue: 7792 year: 2020 ident: 10.1016/j.cosrev.2021.100433_b6 article-title: Improved protein structure prediction using potentials from deep learning publication-title: Nature doi: 10.1038/s41586-019-1923-7 contributor: fullname: Senior – year: 2015 ident: 10.1016/j.cosrev.2021.100433_b58 contributor: fullname: Hinton – year: 2016 ident: 10.1016/j.cosrev.2021.100433_b18 contributor: fullname: Ba – volume: 20 start-page: 30 issue: 1 year: 2011 ident: 10.1016/j.cosrev.2021.100433_b2 article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2011.2134090 contributor: fullname: Dahl – year: 2019 ident: 10.1016/j.cosrev.2021.100433_b59 contributor: fullname: Lan – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b73 contributor: fullname: Wang – start-page: 503 year: 2016 ident: 10.1016/j.cosrev.2021.100433_b76 article-title: Sequence-to-sequence model with attention for time series classification contributor: fullname: Tang – start-page: 1462 year: 2015 ident: 10.1016/j.cosrev.2021.100433_b10 article-title: Draw: A recurrent neural network for image generation contributor: fullname: Gregor – year: 2014 ident: 10.1016/j.cosrev.2021.100433_b12 contributor: fullname: Bahdanau – year: 2015 ident: 10.1016/j.cosrev.2021.100433_b15 contributor: fullname: Luong – ident: 10.1016/j.cosrev.2021.100433_b17 doi: 10.1109/CVPR.2016.90 – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b48 contributor: fullname: Devlin – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b8 contributor: fullname: Lu – volume: 86 start-page: 2278 issue: 11 year: 1998 ident: 10.1016/j.cosrev.2021.100433_b5 article-title: Gradient-based learning applied to document recognition publication-title: Proc. IEEE doi: 10.1109/5.726791 contributor: fullname: LeCun – volume: 2 start-page: 231 year: 2014 ident: 10.1016/j.cosrev.2021.100433_b29 article-title: Entity linking meets word sense disambiguation: A unified approach publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00179 contributor: fullname: Moro – year: 2018 ident: 10.1016/j.cosrev.2021.100433_b52 contributor: fullname: Trinh – volume: 91 issue: 1 year: 1991 ident: 10.1016/j.cosrev.2021.100433_b31 article-title: Untersuchungen zu dynamischen neuronalen netzen publication-title: Diploma Tech. Univ. München contributor: fullname: Hochreiter – volume: 5 start-page: 157 issue: 2 year: 1994 ident: 10.1016/j.cosrev.2021.100433_b75 article-title: Learning long-term dependencies with gradient descent is difficult publication-title: IEEE Trans. Neural Netw. doi: 10.1109/72.279181 contributor: fullname: Bengio – year: 2016 ident: 10.1016/j.cosrev.2021.100433_b50 contributor: fullname: Wu – year: 2021 ident: 10.1016/j.cosrev.2021.100433_b62 contributor: fullname: Fedus – year: 2020 ident: 10.1016/j.cosrev.2021.100433_b66 contributor: fullname: Clark – year: 2015 ident: 10.1016/j.cosrev.2021.100433_b37 contributor: fullname: Rocktäschel – year: 2015 ident: 10.1016/j.cosrev.2021.100433_b45 contributor: fullname: Sennrich |
SSID | ssj0070056 |
Score | 2.3584971 |
SecondaryResourceType | review_article |
Snippet | The amount of data and computing power has drastically increased over the last decade, which leads to the development of several new fronts in the field of... |
SourceID | crossref elsevier |
SourceType | Aggregation Database Publisher |
StartPage | 100433 |
SubjectTerms | Attention mechanism Embedding Natural Language Processing NLP Pre-training model |
Title | Comprehensive analysis of embeddings and pre-training in NLP |
URI | https://dx.doi.org/10.1016/j.cosrev.2021.100433 |
Volume | 42 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qe_HiW6yPsgevsbvZJLsFL6VYqtUiarG3kH0EK5iWWq_-dmeTLCiIB08hAxvCt8l8s8w3MwDnspe7tmMcIzeZBcgQNFAMTylZTxgZ2sjqzGV07ybJaBrdzOJZAwa-FsbJKmvfX_n00lvXlm6NZnc5n3cfWVxOk8MjTJlv4hvQQjoKZRNa_evxaOIdsnDtLsu2qcIJLrjwFXSlzEsv3t2wlxC5zikGIs5_Z6hvrDPcga06XCT96o12oWGLPdj2oxhI_Wfuw6UzrexLJUcnWd1qhCxyYt-UNWWGCc2GONmHnwtB5gWZ3N4fwHR49TQYBfVghEBjhL8OkqinE4TeYGxgONfSINMxHWeJorkIc66FMEjc3GiZID6xpTanPOaKMy4V5YfQLBaFPQISxobGJjIsi1ik8kQxafGRlOpEmLwXtiHwYKTLqv9F6oVhr2kFXurASyvw2iA8YumPfUzRRf-58vjfK09g091VFYKn0FyvPuwZhgpr1YGNi0_WqT8Idx0_PI-_ALp1vgM |
link.rule.ids | 315,783,787,4509,24128,27936,27937,45597,45691 |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qe9CLb7E-9-A1dJNNsgl4KUVJbRsEW-htyT6CFWxLrf_f2SQLCuLB64QN4cvuzLfMNzMAd0la2rZjDJlbUngYIagnfbylFCnXSWBCowqb0Z3kcTYLn-bRvAUDVwtjZZWN7699euWtG0uvQbO3Xix6L35UTZPDK0yVb2I70EE2kOLp7PSHoyx3DpnbdpdV21RuBReMuwq6SualVh922EuAsc4qBkLGfo9Q36LO4yHsN3SR9OsvOoKWWR7DgRvFQJqTeQL31rQxr7UcnRRNqxGyKol5l0ZXGSY0a2JlH24uBFksST5-PoXZ48N0kHnNYARPIcPfenGYqhih18gNNGMq0RjpfBUVsaQlD0qmONcYuJlWSYz4RIaakrKISeazRFJ2Bu3lamnOgQSRppEOtV-EfijLWPqJwVdSqmKuyzTogufAEOu6_4VwwrA3UYMnLHiiBq8L3CEmfvxHgS76z5UX_155C7vZdDIW42E-uoQ9-6SuFryC9nbzaa6RNmzlTbMtvgBmh75U |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comprehensive+analysis+of+embeddings+and+pre-training+in+NLP&rft.jtitle=Computer+science+review&rft.au=Tripathy%2C+Jatin+Karthik&rft.au=Sethuraman%2C+Sibi+Chakkaravarthy&rft.au=Cruz%2C+Meenalosini+Vimal&rft.au=Namburu%2C+Anupama&rft.date=2021-11-01&rft.pub=Elsevier+Inc&rft.issn=1574-0137&rft.eissn=1876-7745&rft.volume=42&rft_id=info:doi/10.1016%2Fj.cosrev.2021.100433&rft.externalDocID=S1574013721000733 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1574-0137&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1574-0137&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1574-0137&client=summon |