Effectiveness of Deep Networks in NLP using BiDAF as an example architecture

Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and context-based embeddings. As BERT has leapfrogged the accuracy of models, an element of the next frontier can be the introduction of deep networks and...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Author	Sarkar, Soumyendu
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 31.08.2021
Subjects	Coders Context Model accuracy Modelling Natural language processing Networks
Online Access	Get full text

Cover

Loading…

Abstract	Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and context-based embeddings. As BERT has leapfrogged the accuracy of models, an element of the next frontier can be the introduction of deep networks and an effective way to train them. In this context, I explored the effectiveness of deep networks focussing on the model encoder layer of BiDAF. BiDAF with its heterogeneous layers provides the opportunity not only to explore the effectiveness of deep networks but also to evaluate whether the refinements made in lower layers are additive to the refinements made in the upper layers of the model architecture. I believe the next greatest model in NLP will in fact fold in a solid language modeling like BERT with a composite architecture which will bring in refinements in addition to generic language modeling and will have a more extensive layered architecture. I experimented with the Bypass network, Residual Highway network, and DenseNet architectures. In addition, I evaluated the effectiveness of ensembling the last few layers of the network. I also studied the difference character embeddings make in adding them to the word embeddings, and whether the effects are additive with deep networks. My studies indicate that deep networks are in fact effective in giving a boost. Also, the refinements in the lower layers like embeddings are passed on additively to the gains made through deep networks.
AbstractList	Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and context-based embeddings. As BERT has leapfrogged the accuracy of models, an element of the next frontier can be the introduction of deep networks and an effective way to train them. In this context, I explored the effectiveness of deep networks focussing on the model encoder layer of BiDAF. BiDAF with its heterogeneous layers provides the opportunity not only to explore the effectiveness of deep networks but also to evaluate whether the refinements made in lower layers are additive to the refinements made in the upper layers of the model architecture. I believe the next greatest model in NLP will in fact fold in a solid language modeling like BERT with a composite architecture which will bring in refinements in addition to generic language modeling and will have a more extensive layered architecture. I experimented with the Bypass network, Residual Highway network, and DenseNet architectures. In addition, I evaluated the effectiveness of ensembling the last few layers of the network. I also studied the difference character embeddings make in adding them to the word embeddings, and whether the effects are additive with deep networks. My studies indicate that deep networks are in fact effective in giving a boost. Also, the refinements in the lower layers like embeddings are passed on additively to the gains made through deep networks.
Author	Sarkar, Soumyendu
Author_xml	– sequence: 1 givenname: Soumyendu surname: Sarkar fullname: Sarkar, Soumyendu
BookMark	eNqNjLsOgjAUQBujiaj8w02cSbCV16gCcTDEwZ005KJFbLEX1M-XwQ9wOsM5OQs21UbjhDlciI0XbzmfM5eo8X2fhxEPAuGwU1bXWPXqhRqJwNSQInZQYP829k6gNBSnMwyk9BX2Kt3lIAmkBvzIR9ciSFvdVD8uBosrNqtlS-j-uGTrPLscjl5nzXNA6svGDFaPquRBGIsk4XEk_qu-JB8-Tg
ContentType	Paper
Copyright	2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS
DatabaseName	ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection
DatabaseTitle	Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Physics
EISSN	2331-8422
Genre	Working Paper/Pre-Print
GroupedDBID	8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS
ID	FETCH-proquest_journals_25683992873
IEDL.DBID	8FG
IngestDate	Thu Oct 10 19:14:07 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-proquest_journals_25683992873
OpenAccessLink	https://www.proquest.com/docview/2568399287?pq-origsite=%requestingapplication%
PQID	2568399287
PQPubID	2050157
ParticipantIDs	proquest_journals_2568399287
PublicationCentury	2000
PublicationDate	20210831
PublicationDateYYYYMMDD	2021-08-31
PublicationDate_xml	– month: 08 year: 2021 text: 20210831 day: 31
PublicationDecade	2020
PublicationPlace	Ithaca
PublicationPlace_xml	– name: Ithaca
PublicationTitle	arXiv.org
PublicationYear	2021
Publisher	Cornell University Library, arXiv.org
Publisher_xml	– name: Cornell University Library, arXiv.org
SSID	ssj0002672553
Score	3.348708
SecondaryResourceType	preprint
Snippet	Question Answering with NLP has progressed through the evolution of advanced model architectures like BERT and BiDAF and earlier word, character, and...
SourceID	proquest
SourceType	Aggregation Database
SubjectTerms	Coders Context Model accuracy Modelling Natural language processing Networks
Title	Effectiveness of Deep Networks in NLP using BiDAF as an example architecture
URI	https://www.proquest.com/docview/2568399287
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bS8MwFD7oiuCbV7zMcUBfi2t6SfIkzrUO2UoRhb2NNhfZS1fXCT752026TgVhjyEQkpB855wvX84BuPFY1BcilCYsIdoNWF-aK6VyN5DS08LYe-bb38iTNBq9Bk_TcNoSbnUrq9xgYgPUciEsR35rTDOzSVQZvaveXVs1yr6utiU0dsHxCKU2-GLJ4w_HQiJqPGb_H8w2tiM5ACfLK7U8hB1VHsFeI7kU9TGM14mDW7TBhcahUhWma112jfMS03GGVpj-hoP58D7BvMa8RPWZ25S--PcN4ASuk_jlYeRupjBrD0k9-12SfwodE-2rM8CIi5D7utDaOFNcKMYFiSTlBVVaE8XPobttpIvt3ZewT6wooyFFu9BZLT_UlbGqq6LXbF0PnEGcZs-mNfmKvwFLzoJa
link.rule.ids	783,787,12777,21400,33385,33756,43612,43817
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60RezNJz6qDug12Oa5exK1xqhp6KFCbyHZnZVe0ti04M93N01VEHpe2Be78_jmmxmAmz7ze0J4UrsltrJc1pP6S1FmuVL2ldD6njkmG3mY-NG7-zrxJg3gVjW0yrVMrAW1nAmDkd9q1cxMEVUW3JWflukaZaKrTQuNbWi7jl7NZIqHzz8Yi-0H2mJ2_onZWneEe9AeZSXN92GLigPYqSmXojqEeFU4uJE2OFM4ICoxWfGyK5wWmMQjNMT0D3yYDu5DzCrMCqSvzJT0xb8xgCO4Dp_Gj5G13kLaPJIq_T2Scwwt7e3TCaDPhccdlSuljSkuiHFh-zLgeUBK2cRPobtpprPNw1ewG42HcRq_JG_n0LENQaMGSLvQWsyXdKE17CK_rK_xG1JGgnE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Effectiveness+of+Deep+Networks+in+NLP+using+BiDAF+as+an+example+architecture&rft.jtitle=arXiv.org&rft.au=Sarkar%2C+Soumyendu&rft.date=2021-08-31&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422