BERTQA -- Attention on Steroids
In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenat...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
14.12.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query (C2Q) and query to context (Q2C) attention via a set of modified Transformer encoder units. In addition, we explore adding convolution-based feature extraction within the coattention architecture to add localized information to self-attention. We found that coattention significantly improves the no answer F1 by 4 points in the base and 1 point in the large architecture. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced an overall dev F1 of 77.03 in the base architecture. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation, which we have named SQUAD 2.Q. Finally, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.317/79.442 (Attention on Steroids, PCE Test Leaderboard). |
---|---|
AbstractList | In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query (C2Q) and query to context (Q2C) attention via a set of modified Transformer encoder units. In addition, we explore adding convolution-based feature extraction within the coattention architecture to add localized information to self-attention. We found that coattention significantly improves the no answer F1 by 4 points in the base and 1 point in the large architecture. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced an overall dev F1 of 77.03 in the base architecture. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation, which we have named SQUAD 2.Q. Finally, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.317/79.442 (Attention on Steroids, PCE Test Leaderboard). |
Author | Chadha, Ankit Sood, Rewa |
Author_xml | – sequence: 1 givenname: Ankit surname: Chadha fullname: Chadha, Ankit – sequence: 2 givenname: Rewa surname: Sood fullname: Sood, Rewa |
BookMark | eNrjYmDJy89LZWLgNDI2NtS1MDEy4mDgLS7OMjAwMDIzNzI1NeZkkHdyDQoJdFTQ1VVwLClJzSvJzM9TAKLgktSi_MyUYh4G1rTEnOJUXijNzaDs5hri7KFbUJRfWJpaXBKflV9alAeUigfaAjTX2MLA2Jg4VQA47S0X |
ContentType | Paper |
Copyright | 2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Architecture Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_23302638033 |
IEDL.DBID | 8FG |
IngestDate | Wed Oct 16 12:53:34 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_23302638033 |
OpenAccessLink | https://www.proquest.com/docview/2330263803?pq-origsite=%requestingapplication% |
PQID | 2330263803 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2330263803 |
PublicationCentury | 2000 |
PublicationDate | 20191214 |
PublicationDateYYYYMMDD | 2019-12-14 |
PublicationDate_xml | – month: 12 year: 2019 text: 20191214 day: 14 |
PublicationDecade | 2010 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2019 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.2460372 |
SecondaryResourceType | preprint |
Snippet | In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Architecture Coders Convolution Datasets Feature extraction Queries Steroids |
Title | BERTQA -- Attention on Steroids |
URI | https://www.proquest.com/docview/2330263803 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQMUgxSEw0tUzWNU-xBHZQzBMTdZOMLA10U83TEi2SDUwTzZJBu5F9_cw8Qk28IkwjoANuxdBllbAyEVxQp-Qng8bI9Y2AHW8jYGIxMLYvKNQF3RoFml2FXqHBzMBqaGRuDkrVFm7u8DEWIzNzYIvZGKOYBdcdboIMrAGJBalFQgxMqXnCDDyOSCP3wgzs4BWYycUiDPJOrkEhgY4KuroKjiUlkEWICkAUDPR4fmZKsSiDsptriLOHLsySeGgyKI5HONpYjIEF2J9PlWBQME8xNkwGdXuMzVNB845JBuZphiaJqUlmZiaphslpkgwy-EySwi8tzcAFrNPBVxwYmsgwsJQUlabKAuvNkiQ5cODIMbA6ufoFBAF5vnWuAAu_dcM |
link.rule.ids | 783,787,12779,21402,33387,33758,43614,43819 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQMUgxSEw0tUzWNU-xBHZQzBMTdZOMLA10U83TEi2SDUwTzZJBu5F9_cw8Qk28IkwjoANuxdBllbAyEVxQp-Qng8bI9Y2AHW8jYGIxMLYvKNQF3RoFml2FXqHBzMBqYgysaEA7xd3c4WMsRmbmwBazMUYxC6473AQZWAMSC1KLhBiYUvOEGXgckUbuhRnYwSswk4tFGOSdXINCAh0VdHUVHEtKIIsQFYAoGOjx_MyUYlEGZTfXEGcPXZgl8dBkUByPcLSxGAMLsD-fKsGgYJ5ibJgM6vYYm6eC5h2TDMzTDE0SU5PMzExSDZPTJBlk8JkkhV9anoHTI8TXJ97H089bmoELWL-DrzswNJFhYCkpKk2VBdahJUly4IACACwxddo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BERTQA+--+Attention+on+Steroids&rft.jtitle=arXiv.org&rft.au=Chadha%2C+Ankit&rft.au=Sood%2C+Rewa&rft.date=2019-12-14&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |