Neural machine translation for Hungarian

In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different...

Full description

Saved in:
Bibliographic Details
Published inActa linguistica academica Vol. 69; no. 4; pp. 501 - 520
Main Authors Laki, László János, Yang, Zijian Győző
Format Journal Article
LanguageEnglish
Published Budapest Akadémiai Kiadó 12.12.2022
Academic Publishing House
Akademiai Kiado Zrt
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora.
AbstractList In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora.
Abstract In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora.
Author Laki, László János
Yang, Zijian Győző
Author_xml – sequence: 1
  fullname: Laki, László János
– sequence: 2
  fullname: Yang, Zijian Győző
BookMark eNpFkE1LAzEQhoNUsNaePQkLXrxsO0k2yeYoxS8oetFzSNOJbtkmNdk9-O_dtQVPMzDP-w48l2QSYkBCriksqBByyUCyBQPGFgBCyTMyZUJCSYHKybgLXdYM6AWZ57wDAFpLJmoxJXev2CfbFnvrvpqARZdsyK3tmhgKH1Px3IdPmxobrsi5t23G-WnOyMfjw_vquVy_Pb2s7tel45x1JUdQXgKwjWaKe029xEpx3Cq3kYLXG8-E91QP1HDm2qFylcPKe18jbIHPyO2x95Did4-5M7vYpzC8NEwprSspJR2o5ZFyKeac0JtDavY2_RgKZjRiRiNmNGL-jAyJm1MCMbb_pRRqpUXFfwEBbVzP
CitedBy_id crossref_primary_10_1145_3665244
Cites_doi 10.1162/tacl_a_00343
10.1162/tacl_a_00300
10.1515/pralin-2017-0003
10.18653/v1/2021.findings-acl.304
10.1162/tacl_a_00288
10.1075/cilt.292.32var
ContentType Journal Article
Copyright Copyright Akademiai Kiado Zrt Dec 2022
Copyright_xml – notice: Copyright Akademiai Kiado Zrt Dec 2022
DBID AE2
BIXPP
REL
AAYXX
CITATION
7T9
8BM
DOI 10.1556/2062.2022.00576
DatabaseName Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)
CEEOL: Open Access
Central and Eastern European Online Library - CEEOL Journals
CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
ComDisDome
DatabaseTitle CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
ComDisDome
DatabaseTitleList
Linguistics and Language Behavior Abstracts (LLBA)
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Languages & Literatures
DocumentTitleAlternate Neural machine translation for Hungarian
EISSN 2560-1016
EndPage 520
ExternalDocumentID 10_1556_2062_2022_00576
1087954
GroupedDBID ABBHK
ABDBF
ABXSQ
ACGFS
ACHDO
ADACV
ADULT
AE2
AEHFS
AELHJ
AEUPB
ALMA_UNASSIGNED_HOLDINGS
BIXPP
EBS
EJD
IPSME
JAAYA
JBMMH
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JPL
JSODD
JST
REL
RKA
SA0
AAYXX
CITATION
7T9
8BM
ID FETCH-LOGICAL-c332t-3e07f6002b9273f91f6e473ed7cb6538bf25ff1907f73f39ce7c4ce4fff8e0d03
ISSN 2559-8201
IngestDate Thu Oct 10 16:47:06 EDT 2024
Fri Aug 23 00:27:59 EDT 2024
Tue Oct 29 22:23:56 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Marian NMT
M2M100
mBART
neural machine translation
BART
mT5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c332t-3e07f6002b9273f91f6e473ed7cb6538bf25ff1907f73f39ce7c4ce4fff8e0d03
ORCID 0000-0001-9955-860X
0000-0003-4958-7968
OpenAccessLink https://www.ceeol.com//search/article-detail?id=1087954
PQID 2779946661
PQPubID 2038400
PageCount 20
ParticipantIDs proquest_journals_2779946661
ceeol_journals_1087954
crossref_primary_10_1556_2062_2022_00576
PublicationCentury 2000
PublicationDate 2022-12-12
PublicationDateYYYYMMDD 2022-12-12
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-12
  day: 12
PublicationDecade 2020
PublicationPlace Budapest
PublicationPlace_xml – name: Budapest
PublicationTitle Acta linguistica academica
PublicationTitleAlternate Acta Linguistica Academica An International Journal of Linguistics (Until 2016 Acta Linguistica Hungarica)
PublicationYear 2022
Publisher Akadémiai Kiadó
Academic Publishing House
Akademiai Kiado Zrt
Publisher_xml – name: Akadémiai Kiadó
– name: Academic Publishing House
– name: Akademiai Kiado Zrt
References Post, Matt (B23) 2018
Kim, Yoon (B11) 2014
Rescigno, Argentina Anna (B27) 2020
Papineni, Kishore (B21) 2002
Xue, Linting (B39) 2021
Tiedemann, Jörg (B32) 2020
Miculicich, Lesly (B19) 2018
Rajpurkar, Pranav (B26) 2016
Junczys-Dowmunt, Marcin (B10) 2018
Cho, Kyunghyun (B5) 2014
Tang, Yuqing (B31) 2021
Devlin, Jacob (B7) 2019
Hu, Junjie (B8) 2020
Kúdela, Jakub (B12) 2017; 107
Merity, Stephen (B18) 2017
Wu, Yonghui (B38) 2016
Liu, Yinhan (B16) 2020; 8
Wenzek, Guillaume (B37) 2020
Artetxe, Mikel (B2) 2019; 7
Barrault, Loïc (B3) 2019
Conneau, Alexis (B6) 2020
Joshi, Mandar (B9) 2020; 8
Kudo, Taku (B13) 2018
Bucila, Cristian (B4) 2006
Prokhorenkova, Liudmila (B24) 2018
Tang, Yuqing. (B30) 2020
Lewis, Mike (B14) 2020
Raffel, Colin (B25) 2020; 21
Aharoni, Roee (B1) 2019
Maučec, Mirjam Sepesy (B17) 2019
Sun, Meng (B29) 2019
Nemeskey, Dávid Márk (B20) 2020
Tiedemann, Jörg (B33) 2012
Popović, Maja (B22) 2015
Wang, Alex (B36) 2018
Varga, Dániel (B34) 2007
Sennrich, Rico (B28) 2016
Li, Liangyou (B15) 2019
Vaswani, Ashish (B35) 2017; 30
References_xml – year: 2016
  ident: B38
  article-title: Google's neural machine translation system: Bridging the gap between human and machine translation
  contributor:
    fullname: Wu, Yonghui
– start-page: 483
  year: 2021
  ident: B39
  article-title: mT5: A massively multilingual pre-trained text-to-text transformer
  contributor:
    fullname: Xue, Linting
– start-page: 3874
  year: 2019
  ident: B1
  article-title: Massively multilingual neural machine translation
  contributor:
    fullname: Aharoni, Roee
– start-page: 353
  year: 2018
  ident: B36
  article-title: Glue: A multi-task benchmark and analysis platform for natural language understanding
  contributor:
    fullname: Wang, Alex
– start-page: 4411
  year: 2020
  ident: B8
  article-title: XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation
  contributor:
    fullname: Hu, Junjie
– volume: 8
  start-page: 726
  issue: 11
  year: 2020
  ident: B16
  article-title: Multilingual denoising pre-training for neural machine translation
  doi: 10.1162/tacl_a_00343
  contributor:
    fullname: Liu, Yinhan
– start-page: 4171
  year: 2019
  ident: B7
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
  contributor:
    fullname: Devlin, Jacob
– start-page: 1
  year: 2019
  ident: B3
  article-title: Findings of the 2019 Conference on Machine Translation (WMT19)
  contributor:
    fullname: Barrault, Loïc
– volume: 8
  start-page: 64
  year: 2020
  ident: B9
  article-title: SpanBERT: Improving pre-training by representing and predicting spans
  doi: 10.1162/tacl_a_00300
  contributor:
    fullname: Joshi, Mandar
– volume: 30
  start-page: 5998
  year: 2017
  ident: B35
  article-title: Attention is all you need
  contributor:
    fullname: Vaswani, Ashish
– volume: 107
  start-page: 39
  issue: 1
  year: 2017
  ident: B12
  article-title: Extracting parallel paragraphs from common crawl
  doi: 10.1515/pralin-2017-0003
  contributor:
    fullname: Kúdela, Jakub
– start-page: 374
  year: 2019
  ident: B29
  article-title: Baidu neural machine translation systems for WMT19
  contributor:
    fullname: Sun, Meng
– start-page: 311
  year: 2002
  ident: B21
  article-title: Bleu: A method for automatic evaluation of machine translation
  contributor:
    fullname: Papineni, Kishore
– volume: 21
  start-page: 1
  issue: 140
  year: 2020
  ident: B25
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  contributor:
    fullname: Raffel, Colin
– start-page: 1715
  year: 2016
  ident: B28
  article-title: Neural machine translation of rare words with subword units
  contributor:
    fullname: Sennrich, Rico
– start-page: 4003
  year: 2020
  ident: B37
  article-title: CCNet: Extracting high quality monolingual datasets from web crawl data
  contributor:
    fullname: Wenzek, Guillaume
– start-page: 186
  year: 2018
  ident: B23
  article-title: A call for clarity in reporting BLEU scores
  contributor:
    fullname: Post, Matt
– start-page: 143
  year: 2019
  ident: B17
  article-title: Machine translation and the evaluation of its quality
  contributor:
    fullname: Maučec, Mirjam Sepesy
– start-page: 6639
  year: 2018
  ident: B24
  article-title: CatBoost: Unbiased boosting with categorical features
  contributor:
    fullname: Prokhorenkova, Liudmila
– year: 2019
  ident: B15
  article-title: Pretrained language models for document-level neural machine translation
  contributor:
    fullname: Li, Liangyou
– start-page: 62
  year: 2020
  ident: B27
  article-title: A case study of natural gender phenomena in translation: A comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish
  contributor:
    fullname: Rescigno, Argentina Anna
– year: 2017
  ident: B18
  article-title: Pointer sentinel mixture models
  contributor:
    fullname: Merity, Stephen
– start-page: 392
  year: 2015
  ident: B22
  article-title: chrF: character n-gram F-score for automatic MT evaluation
  contributor:
    fullname: Popović, Maja
– start-page: 66
  year: 2018
  ident: B13
  article-title: SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing
  contributor:
    fullname: Kudo, Taku
– year: 2020
  ident: B30
  article-title: Multilingual translation with extensible multilingual pretraining and finetuning
  contributor:
    fullname: Tang, Yuqing.
– start-page: 7871
  year: 2020
  ident: B14
  article-title: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension
  contributor:
    fullname: Lewis, Mike
– start-page: 8440
  year: 2020
  ident: B6
  article-title: Unsupervised cross-lingual representation learning at scale
  contributor:
    fullname: Conneau, Alexis
– start-page: 3450
  year: 2021
  ident: B31
  article-title: Multilingual translation from denoising pre-training
  doi: 10.18653/v1/2021.findings-acl.304
  contributor:
    fullname: Tang, Yuqing
– start-page: 2214
  year: 2012
  ident: B33
  article-title: Parallel data, tools and interfaces in OPUS
  contributor:
    fullname: Tiedemann, Jörg
– start-page: 1746
  year: 2014
  ident: B11
  article-title: Convolutional neural networks for sentence classification
  contributor:
    fullname: Kim, Yoon
– volume: 7
  start-page: 597
  year: 2019
  ident: B2
  article-title: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond
  doi: 10.1162/tacl_a_00288
  contributor:
    fullname: Artetxe, Mikel
– year: 2020
  ident: B20
  contributor:
    fullname: Nemeskey, Dávid Márk
– start-page: 2947
  year: 2018
  ident: B19
  article-title: Document-level neural machine translation with hierarchical attention networks
  contributor:
    fullname: Miculicich, Lesly
– start-page: 479
  year: 2020
  ident: B32
  article-title: OPUS-MT — Building open translation services for the World
  contributor:
    fullname: Tiedemann, Jörg
– start-page: 116
  year: 2018
  ident: B10
  article-title: Marian: Fast neural machine translation in C++
  contributor:
    fullname: Junczys-Dowmunt, Marcin
– start-page: 1724
  year: 2014
  ident: B5
  article-title: Learning phrase representations using RNN encoder–decoder for statistical machine translation
  contributor:
    fullname: Cho, Kyunghyun
– start-page: 535
  year: 2006
  ident: B4
  article-title: Model compression
  contributor:
    fullname: Bucila, Cristian
– start-page: 2383
  year: 2016
  ident: B26
  article-title: SQuAD: 100,000+ questions for machine comprehension of text
  contributor:
    fullname: Rajpurkar, Pranav
– start-page: 247
  year: 2007
  ident: B34
  article-title: Parallel corpora for medium density languages
  doi: 10.1075/cilt.292.32var
  contributor:
    fullname: Varga, Dániel
SSID ssj0001862585
Score 2.2815099
Snippet In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the...
Abstract In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance...
SourceID proquest
crossref
ceeol
SourceType Aggregation Database
Publisher
StartPage 501
SubjectTerms English language
Hungarian language
Machine translation
Translation Studies
Title Neural machine translation for Hungarian
URI https://www.ceeol.com//search/article-detail?id=1087954
https://www.proquest.com/docview/2779946661
Volume 69
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagvXBBvAoLBeVAUaWVIfGTHHcLZVUWTq1UuESJY0tbliwi6YH99czY3mRbQAIueTiKE82Mx9-MPTOEPDeMOyaZobpOKyoqllGYhg1VTEjA59YJn0vvw0c1OxMn5_J8qLHpo0u66qVZ_zau5H-4Cm3AV4yS_QfO9p1CA1wDf-EIHIbjX_EYM2tg9IffEGmx3EPTLofdgzMYyGAKR_5vUs2arhxjDPqlT9Fcjsu4Q77Xz_MylLKe-0X0rF0v8WLKxyehoVn1OPxT9DZ_Xlygonj34-BIHuTZOpy2PQrMFzfJBvtz8gWEy6_Sf12Ui_H7Rbid8kE1oR1CETuEWSS2qZSiL2Bbt4YyLFGGxJailJt3w50PiPtVnUuJngWWKgyZY5haVepribOjJYM108VNsstA34Ci251M30yPB2cb2G3Sl2ft_zymeYIvvLrWPyZrtXa1vIpUrk7UHn2c3iG3o9mQTIIM3CU3bHOPPJxHZ3ObvEjmfX7s9j45DJKRRMlItiQjAclIesl4QM6O354ezWisiUEN56yj3Kba4VpqlQPwdHnmlBWa21qbSsHkVcHIcw5QnnbwGEad1UYYK5xzr21ap3yP7DSrxj4iictrJRxnOlW1sKkC4FanBhBlxbUpmR2RPU-HIkp8W0Qyj8jhhjDFt5ATpUBbEmhZIC0LpGXhaTki-xvCDd0wrXMsa6Cyx3_6xBNya5DMfbLTfb-0TwH9ddWzyNufxHJPFw
link.rule.ids 315,783,787,27936,27937
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Neural+machine+translation+for+Hungarian&rft.jtitle=Acta+linguistica+academica&rft.au=Laki%2C+L%C3%A1szl%C3%B3+J%C3%A1nos&rft.au=Yang%2C+Zijian+Gy%C5%91z%C5%91&rft.date=2022-12-12&rft.pub=Akad%C3%A9miai+Kiad%C3%B3&rft.issn=2559-8201&rft.eissn=2560-1016&rft.volume=69&rft.issue=4&rft.spage=501&rft.epage=520&rft_id=info:doi/10.1556%2F2062.2022.00576&rft.externalDocID=1087954
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.ceeol.com%2F%2Fapi%2Fimage%2Fgetissuecoverimage%3Fid%3Dpicture_2022_71143.jpg