基于梯度权重变化训练策略的低资源机器翻译

TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频...

Full description

Saved in:
Bibliographic Details
Published in计算机科学与探索 Vol. 18; no. 3; pp. 731 - 739
Main Authors 王家琪, 朱俊国, 余正涛
Format Journal Article
LanguageChinese
Published 昆明理工大学 信息工程与自动化学院,昆明 650500 01.03.2024
昆明理工大学 云南省人工智能重点实验室,昆明 650500
Subjects
Online AccessGet full text
ISSN1673-9418
DOI10.3778/j.issn.1673-9418.2211078

Cover

Abstract TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频词汇的过拟合.在三个低资源的双语数据集上进行了翻译任务实验,该方法在测试集上相对于基线模型分别提升了0.72、1.37和1.04个BLEU得分.
AbstractList TP391; 近年来Transformer等神经网络模型在机器翻译上取得了显著的成功,但训练这些模型需要依靠丰富的有标签数据,而低资源机器翻译因受限于平行语料库的规模,导致训练得到的模型表现不佳,同时很容易针对高频词汇过度拟合,从而降低模型在测试集上的泛化能力.为了缓解这一现象,提出了一种梯度权重变化的策略,即在Adam算法基础上为每一个新批次所产生的梯度乘以一个系数.该系数递增变化,旨在在训练早期削弱对高频特征的依赖,而在训练后期保持算法的快速收敛优势.介绍了模型改进后的训练流程,其中包括系数的调整和衰减,以实现在不同训练阶段的不同侧重.这种策略的目标是增加对低频词汇的关注度,防止模型对高频词汇的过拟合.在三个低资源的双语数据集上进行了翻译任务实验,该方法在测试集上相对于基线模型分别提升了0.72、1.37和1.04个BLEU得分.
Abstract_FL In recent years,neural network models such as Transformer have achieved significant success in machine translation.However,training these models relies on rich labeled data,posing a challenge for low-resource machine translation due to the limited scale of parallel corpora.This limitation often leads to subpar performance and a sus-ceptibility to overfitting on high-frequency vocabulary,thereby reducing the model's generalization ability on the test set.To alleviate these issues,this paper proposes a strategy of gradient weight modification.Specifically,it sug-gests multiplying the gradients generated for each new batch by a coefficient on top of the Adam algorithm.This co-efficient incrementally increases,aiming to weaken the model's dependence on high-frequency features during early training while maintaining the rapid convergence advantage of the algorithm in the later stages.This paper also out-lines the modified training process,including adjustments and decay of coefficients,to emphasize different aspects at different training stages.The goal of this strategy is to enhance attention to low-frequency vocabulary and prevent the model from overfitting to high-frequency terms.Experimental translation tasks are conducted on three low-resource bilingual datasets,and the proposed method demonstrates improvements of 0.72,1.37,and 1.04 BLEU scores relative to the baseline model on the respective test set.
Author 朱俊国
王家琪
余正涛
AuthorAffiliation 昆明理工大学 信息工程与自动化学院,昆明 650500;昆明理工大学 云南省人工智能重点实验室,昆明 650500
AuthorAffiliation_xml – name: 昆明理工大学 信息工程与自动化学院,昆明 650500;昆明理工大学 云南省人工智能重点实验室,昆明 650500
Author_FL ZHU Junguo
YU Zhengtao
WANG Jiaqi
Author_FL_xml – sequence: 1
  fullname: WANG Jiaqi
– sequence: 2
  fullname: ZHU Junguo
– sequence: 3
  fullname: YU Zhengtao
Author_xml – sequence: 1
  fullname: 王家琪
– sequence: 2
  fullname: 朱俊国
– sequence: 3
  fullname: 余正涛
BookMark eNo9jbtKA0EYRqeIYIx5B1uLXWf-uYONBG8QsNE67FWyygQcRa0TsNGohUFFEQTFJqay2SC-TPbiWxhQrD7OKc43hyqmYyKEFgh2qZRqKXHb1hqXCEkdzYhyAQjBUlVQ9d_Norq1bR9zxoBIoapoOXtKJ2k_fx5l6Wv-2P0-u8gub7PzQfk-LMbdYjgobl6K-97ks19-9PL0Kn9Is7u34mtcjq7n0Uzs7duo_rc1tLO2ut3YcJpb65uNlaZjCWbgxIyDBsUolyGXvs8DTbnwuFKcB0QLDBqLKfmS-CrQHkgWhoJLRQMSQRDRGlr87R57JvbMbivpHB2Y6WMrscneyemhBQwMU0yA_gCyjWLf
ClassificationCodes TP391
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3778/j.issn.1673-9418.2211078
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
DocumentTitle_FL Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight
EndPage 739
ExternalDocumentID jsjkxyts202403012
GroupedDBID 2B.
4A8
92I
93N
ALMA_UNASSIGNED_HOLDINGS
M~E
PSX
TCJ
ID FETCH-LOGICAL-s1042-f4529284357d57bb5c9356a58855c19602906588b71b8c9a274dd65783c1e2ce3
ISSN 1673-9418
IngestDate Thu May 29 04:00:18 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords 过拟合
overfitting
dynamic gradient weight
动态梯度权重
neural machine translation
神经机器翻译
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1042-f4529284357d57bb5c9356a58855c19602906588b71b8c9a274dd65783c1e2ce3
PageCount 9
ParticipantIDs wanfang_journals_jsjkxyts202403012
PublicationCentury 2000
PublicationDate 2024-03-01
PublicationDateYYYYMMDD 2024-03-01
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-01
  day: 01
PublicationDecade 2020
PublicationTitle 计算机科学与探索
PublicationTitle_FL Journal of Frontiers of Computer Science & Technology
PublicationYear 2024
Publisher 昆明理工大学 信息工程与自动化学院,昆明 650500
昆明理工大学 云南省人工智能重点实验室,昆明 650500
Publisher_xml – name: 昆明理工大学 云南省人工智能重点实验室,昆明 650500
– name: 昆明理工大学 信息工程与自动化学院,昆明 650500
SSID ssib054421768
ssib002040941
ssib002423894
ssib051375751
ssib023646573
ssib036438069
ssib002040926
Score 2.3461802
Snippet TP391;...
SourceID wanfang
SourceType Aggregation Database
StartPage 731
Title 基于梯度权重变化训练策略的低资源机器翻译
URI https://d.wanfangdata.com.cn/periodical/jsjkxyts202403012
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA-1XryIouI3RZxT2ZpkZjIz4GWyzVKEemqht5Jks0qFFdwtaA9eWvCiVQ8WFUUQFBFqT162iP9Md7f-F773NptEWqF6GWbezL7f-0jy3szOMI5zPeUY5bJmLTUaJihJS9QSIZKahlAislS0Yro_Zf52MLcobi3JpYljXyu7lla7yUy6dui5kv_xKtDAr3hK9h88WzAFAtTBv1CCh6E8ko9ZJJlpsNCySGCpIxYFzPrMNrALKDZAipllmrPIMJj561ns0g1mNFXqzMAYzWzELHQpFoY0WGETuxQzkllJFYAQhDVLWJqFkigBYhmXsOokDwhmmNXEsIE8EQJAvWo2PMb1CC5iRlU4AAUQPWQFkqAigKtzHaEENZG5gMr4qUEC9OiQfhSxcCS-y6wth4wAPGLXYHokawgqlUNIQZAfjQnQnBSE34XVNRJflJvE6Kkm1iBgMK5EOTpSwBuKrAiCCdRsrNZ0LggaAfTyxr5Tuc3BhqhPobtGJ1qSGmS3VScWhjJkfB9en4MyTUOyLF23EowCxWtG5PHpQLQqFzMo9Kg8mmZ5yxwWILlSmgIkAswUADM-rQPoMikotmqudFbuPXzU7aBZcfoM6c5xXynaEjH_OCpTPYgGpjpVxbb448w05MbFtx_vLQhkmTpDk2s3KFJr6XGFfwkWbSFg8jw62TqWerQxD1W68TeF6CheuxW371SyxoVTzsl8ujdlR-_uaWdi7e4Z52b_Q2-vtzn4uNPvfR68X__15Fn_-ev-0639b9vD3fXh9tbw1afh2429H5v73zcGvReDd73-my_Dn7v7Oy_POouNaKE-V8tvMal1PDz71sKtDZAEcqmaUiWJTA2XQSy1ljKF-OfihQvQSpSX6NTEvhLNJphG89TL_DTj55zJ9v12dt6ZEjwxuEITa-2J2GvqGCYXQcq13_JF4jYvONdybZfzr1Rn-YD_Lh5l0CXnRPkeXXYmuw9WsyuQfXeTq-T237eqn9g
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E6%A2%AF%E5%BA%A6%E6%9D%83%E9%87%8D%E5%8F%98%E5%8C%96%E8%AE%AD%E7%BB%83%E7%AD%96%E7%95%A5%E7%9A%84%E4%BD%8E%E8%B5%84%E6%BA%90%E6%9C%BA%E5%99%A8%E7%BF%BB%E8%AF%91&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8E%A2%E7%B4%A2&rft.au=%E7%8E%8B%E5%AE%B6%E7%90%AA&rft.au=%E6%9C%B1%E4%BF%8A%E5%9B%BD&rft.au=%E4%BD%99%E6%AD%A3%E6%B6%9B&rft.date=2024-03-01&rft.pub=%E6%98%86%E6%98%8E%E7%90%86%E5%B7%A5%E5%A4%A7%E5%AD%A6+%E4%BF%A1%E6%81%AF%E5%B7%A5%E7%A8%8B%E4%B8%8E%E8%87%AA%E5%8A%A8%E5%8C%96%E5%AD%A6%E9%99%A2%2C%E6%98%86%E6%98%8E+650500&rft.issn=1673-9418&rft.volume=18&rft.issue=3&rft.spage=731&rft.epage=739&rft_id=info:doi/10.3778%2Fj.issn.1673-9418.2211078&rft.externalDocID=jsjkxyts202403012
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjkxyts%2Fjsjkxyts.jpg