AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attentio...

Full description

Saved in:

Bibliographic Details
Main Authors	Yang, Tao, Deng, Jinghao, Quan, Xiaojun, Wang, Qifan, Nie, Shaoliang
Format	Journal Article
Language	English
Published	11.10.2022
Subjects	Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

Abstract	Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.
AbstractList	Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.
Author	Nie, Shaoliang Yang, Tao Wang, Qifan Deng, Jinghao Quan, Xiaojun
Author_xml	– sequence: 1 givenname: Tao surname: Yang fullname: Yang, Tao – sequence: 2 givenname: Jinghao surname: Deng fullname: Deng, Jinghao – sequence: 3 givenname: Xiaojun surname: Quan fullname: Quan, Xiaojun – sequence: 4 givenname: Qifan surname: Wang fullname: Wang, Qifan – sequence: 5 givenname: Shaoliang surname: Nie fullname: Nie, Shaoliang
BackLink	https://doi.org/10.48550/arXiv.2210.05883$$DView paper in arXiv
BookMark	eNotz01PwyAAxnEOepjTD7CTfAEm463grVmdW1Izs_Te0AINyYSFwaLfXp2enuR_eJLfHbgJMVgAFiu8ZJJz_KTTp78sCfkJmEtJZ2BXN6g57N-fYZ1z8kPJPgbUJH-xATYpnmLJ0MUED3Eo5wxbHaaiJwvforFHuPHBoq4EH6Z7cOv08Wwf_ncOus1Lt96idv-6W9ct0qKiSCrCraSWYCHIOBIxMlENbKWUGQURijqnpbCUVIZXAkvjmDKMKzdiIyTWdA4e_26vlv6U_IdOX_2vqb-a6DeQikcI
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by-nc-sa/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by-nc-sa/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2210.05883
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2210_05883
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a673-8925e83e20662cc26c467b4199dc62693ffa86e327d57608df49d459fc0d680a3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:42:10 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a673-8925e83e20662cc26c467b4199dc62693ffa86e327d57608df49d459fc0d680a3
OpenAccessLink	https://arxiv.org/abs/2210.05883
ParticipantIDs	arxiv_primary_2210_05883
PublicationCentury	2000
PublicationDate	2022-10-11
PublicationDateYYYYMMDD	2022-10-11
PublicationDate_xml	– month: 10 year: 2022 text: 2022-10-11 day: 11
PublicationDecade	2020
PublicationYear	2022
Score	1.8572553
SecondaryResourceType	preprint
Snippet	Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language
Title	AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
URI	https://arxiv.org/abs/2210.05883
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolKf5OA1uM1rk96Ka62iVkqF3kqeIEgr7a74801218fFazKXmZCZL5OZbwAuFfEi89RjbR3FzDAZ75ynWIZEBWPy4G3qRn58EpMXdr_giw6g714Yvfl8_Wj4gc32ipBUecWlpF3oEpJKtm6ni-ZzsqbiauV_5SLGrJf-BInxHuy26A6NmuPYh45fHcDdqMDFbPo8RKPyZ74ULjbJz6AiTSmoShSxI5qtTbUt0UObQkRpTtkbGkcciOdVyl8cwnx8M7-e4HaCAdYijworwr2kPlGmE2uJsNEtGTZQytn4kFA0BC2FpyR3EfZn0gWmHOMq2MwJmWl6BL3VeuX7gCyPN1FT4XMaY35wmkojFJe5FIwHS4-hX-u9fG9IKpbJJMvaJCf_b53CDknl_KlCY3AGvXJT-fMYZEtzUVv6C42EesU
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AD-DROP%3A+Attribution-Driven+Dropout+for+Robust+Language+Model+Fine-Tuning&rft.au=Yang%2C+Tao&rft.au=Deng%2C+Jinghao&rft.au=Quan%2C+Xiaojun&rft.au=Wang%2C+Qifan&rft.date=2022-10-11&rft_id=info:doi/10.48550%2Farxiv.2210.05883&rft.externalDocID=2210_05883