基于梯度权重变化训练策略的低资源机器翻译
TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频...
Saved in:
Published in | 计算机科学与探索 Vol. 18; no. 3; pp. 731 - 739 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | Chinese |
Published |
昆明理工大学 信息工程与自动化学院,昆明 650500
01.03.2024
昆明理工大学 云南省人工智能重点实验室,昆明 650500 |
Subjects | |
Online Access | Get full text |
ISSN | 1673-9418 |
DOI | 10.3778/j.issn.1673-9418.2211078 |
Cover
Abstract | TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频词汇的过拟合.在三个低资源的双语数据集上进行了翻译任务实验,该方法在测试集上相对于基线模型分别提升了0.72、1.37和1.04个BLEU得分. |
---|---|
AbstractList | TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频词汇的过拟合.在三个低资源的双语数据集上进行了翻译任务实验,该方法在测试集上相对于基线模型分别提升了0.72、1.37和1.04个BLEU得分. |
Abstract_FL | In recent years,neural network models such as Transformer have achieved significant success in machine translation.However,training these models relies on rich labeled data,posing a challenge for low-resource machine translation due to the limited scale of parallel corpora.This limitation often leads to subpar performance and a sus-ceptibility to overfitting on high-frequency vocabulary,thereby reducing the model's generalization ability on the test set.To alleviate these issues,this paper proposes a strategy of gradient weight modification.Specifically,it sug-gests multiplying the gradients generated for each new batch by a coefficient on top of the Adam algorithm.This co-efficient incrementally increases,aiming to weaken the model's dependence on high-frequency features during early training while maintaining the rapid convergence advantage of the algorithm in the later stages.This paper also out-lines the modified training process,including adjustments and decay of coefficients,to emphasize different aspects at different training stages.The goal of this strategy is to enhance attention to low-frequency vocabulary and prevent the model from overfitting to high-frequency terms.Experimental translation tasks are conducted on three low-resource bilingual datasets,and the proposed method demonstrates improvements of 0.72,1.37,and 1.04 BLEU scores relative to the baseline model on the respective test set. |
Author | 朱俊国 王家琪 余正涛 |
AuthorAffiliation | 昆明理工大学 信息工程与自动化学院,昆明 650500;昆明理工大学 云南省人工智能重点实验室,昆明 650500 |
AuthorAffiliation_xml | – name: 昆明理工大学 信息工程与自动化学院,昆明 650500;昆明理工大学 云南省人工智能重点实验室,昆明 650500 |
Author_FL | ZHU Junguo YU Zhengtao WANG Jiaqi |
Author_FL_xml | – sequence: 1 fullname: WANG Jiaqi – sequence: 2 fullname: ZHU Junguo – sequence: 3 fullname: YU Zhengtao |
Author_xml | – sequence: 1 fullname: 王家琪 – sequence: 2 fullname: 朱俊国 – sequence: 3 fullname: 余正涛 |
BookMark | eNo9jbtKA0EYRqeIYIx5B1uLXWf-uYONBG8QsNE67FWyygQcRa0TsNGohUFFEQTFJqay2SC-TPbiWxhQrD7OKc43hyqmYyKEFgh2qZRqKXHb1hqXCEkdzYhyAQjBUlVQ9d_Norq1bR9zxoBIoapoOXtKJ2k_fx5l6Wv-2P0-u8gub7PzQfk-LMbdYjgobl6K-97ks19-9PL0Kn9Is7u34mtcjq7n0Uzs7duo_rc1tLO2ut3YcJpb65uNlaZjCWbgxIyDBsUolyGXvs8DTbnwuFKcB0QLDBqLKfmS-CrQHkgWhoJLRQMSQRDRGlr87R57JvbMbivpHB2Y6WMrscneyemhBQwMU0yA_gCyjWLf |
ClassificationCodes | TP391 |
ContentType | Journal Article |
Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
DBID | 2B. 4A8 92I 93N PSX TCJ |
DOI | 10.3778/j.issn.1673-9418.2211078 |
DatabaseName | Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
DocumentTitle_FL | Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight |
EndPage | 739 |
ExternalDocumentID | jsjkxyts202403012 |
GroupedDBID | 2B. 4A8 92I 93N ALMA_UNASSIGNED_HOLDINGS M~E PSX TCJ |
ID | FETCH-LOGICAL-s1042-f4529284357d57bb5c9356a58855c19602906588b71b8c9a274dd65783c1e2ce3 |
ISSN | 1673-9418 |
IngestDate | Thu May 29 04:00:18 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | 过拟合 overfitting dynamic gradient weight 动态梯度权重 neural machine translation 神经机器翻译 |
Language | Chinese |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-s1042-f4529284357d57bb5c9356a58855c19602906588b71b8c9a274dd65783c1e2ce3 |
PageCount | 9 |
ParticipantIDs | wanfang_journals_jsjkxyts202403012 |
PublicationCentury | 2000 |
PublicationDate | 2024-03-01 |
PublicationDateYYYYMMDD | 2024-03-01 |
PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | 计算机科学与探索 |
PublicationTitle_FL | Journal of Frontiers of Computer Science & Technology |
PublicationYear | 2024 |
Publisher | 昆明理工大学 信息工程与自动化学院,昆明 650500 昆明理工大学 云南省人工智能重点实验室,昆明 650500 |
Publisher_xml | – name: 昆明理工大学 云南省人工智能重点实验室,昆明 650500 – name: 昆明理工大学 信息工程与自动化学院,昆明 650500 |
SSID | ssib054421768 ssib002040941 ssib002423894 ssib051375751 ssib023646573 ssib036438069 ssib002040926 |
Score | 2.3461802 |
Snippet | TP391;... |
SourceID | wanfang |
SourceType | Aggregation Database |
StartPage | 731 |
Title | 基于梯度权重变化训练策略的低资源机器翻译 |
URI | https://d.wanfangdata.com.cn/periodical/jsjkxyts202403012 |
Volume | 18 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA-1XryIouI3RZxT2ZpkZjIz4GWyzVKEemqht5Jks0qFFdwtaA9eWvCiVQ8WFUUQFBFqT162iP9Md7f-F773NptEWqF6GWbezL7f-0jy3szOMI5zPeUY5bJmLTUaJihJS9QSIZKahlAislS0Yro_Zf52MLcobi3JpYljXyu7lla7yUy6dui5kv_xKtDAr3hK9h88WzAFAtTBv1CCh6E8ko9ZJJlpsNCySGCpIxYFzPrMNrALKDZAipllmrPIMJj561ns0g1mNFXqzMAYzWzELHQpFoY0WGETuxQzkllJFYAQhDVLWJqFkigBYhmXsOokDwhmmNXEsIE8EQJAvWo2PMb1CC5iRlU4AAUQPWQFkqAigKtzHaEENZG5gMr4qUEC9OiQfhSxcCS-y6wth4wAPGLXYHokawgqlUNIQZAfjQnQnBSE34XVNRJflJvE6Kkm1iBgMK5EOTpSwBuKrAiCCdRsrNZ0LggaAfTyxr5Tuc3BhqhPobtGJ1qSGmS3VScWhjJkfB9en4MyTUOyLF23EowCxWtG5PHpQLQqFzMo9Kg8mmZ5yxwWILlSmgIkAswUADM-rQPoMikotmqudFbuPXzU7aBZcfoM6c5xXynaEjH_OCpTPYgGpjpVxbb448w05MbFtx_vLQhkmTpDk2s3KFJr6XGFfwkWbSFg8jw62TqWerQxD1W68TeF6CheuxW371SyxoVTzsl8ujdlR-_uaWdi7e4Z52b_Q2-vtzn4uNPvfR68X__15Fn_-ev-0639b9vD3fXh9tbw1afh2429H5v73zcGvReDd73-my_Dn7v7Oy_POouNaKE-V8tvMal1PDz71sKtDZAEcqmaUiWJTA2XQSy1ljKF-OfihQvQSpSX6NTEvhLNJphG89TL_DTj55zJ9v12dt6ZEjwxuEITa-2J2GvqGCYXQcq13_JF4jYvONdybZfzr1Rn-YD_Lh5l0CXnRPkeXXYmuw9WsyuQfXeTq-T237eqn9g |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E6%A2%AF%E5%BA%A6%E6%9D%83%E9%87%8D%E5%8F%98%E5%8C%96%E8%AE%AD%E7%BB%83%E7%AD%96%E7%95%A5%E7%9A%84%E4%BD%8E%E8%B5%84%E6%BA%90%E6%9C%BA%E5%99%A8%E7%BF%BB%E8%AF%91&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8E%A2%E7%B4%A2&rft.au=%E7%8E%8B%E5%AE%B6%E7%90%AA&rft.au=%E6%9C%B1%E4%BF%8A%E5%9B%BD&rft.au=%E4%BD%99%E6%AD%A3%E6%B6%9B&rft.date=2024-03-01&rft.pub=%E6%98%86%E6%98%8E%E7%90%86%E5%B7%A5%E5%A4%A7%E5%AD%A6+%E4%BF%A1%E6%81%AF%E5%B7%A5%E7%A8%8B%E4%B8%8E%E8%87%AA%E5%8A%A8%E5%8C%96%E5%AD%A6%E9%99%A2%2C%E6%98%86%E6%98%8E+650500&rft.issn=1673-9418&rft.volume=18&rft.issue=3&rft.spage=731&rft.epage=739&rft_id=info:doi/10.3778%2Fj.issn.1673-9418.2211078&rft.externalDocID=jsjkxyts202403012 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjkxyts%2Fjsjkxyts.jpg |