Neural machine translation for Hungarian
In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different...
Saved in:
Published in | Acta linguistica academica Vol. 69; no. 4; pp. 501 - 520 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Budapest
Akadémiai Kiadó
12.12.2022
Academic Publishing House Akademiai Kiado Zrt |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora. |
---|---|
AbstractList | In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora. Abstract In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the English-Hungarian language pair. Hungarian is considered to be a challenging language for machine translation because it has a highly different grammatical structure and word ordering compared to English. We probed various machine translation systems from both academic and industrial applications. One key highlight of our work is that our models (Marian NMT, BART) performed significantly better than the solutions offered by most of the market-leader multinational companies. Finally, we fine-tuned different pre-finetuned models (mT5, mBART, M2M100) for English-Hungarian translation, which achieved state-of-the-art results in our test corpora. |
Author | Laki, László János Yang, Zijian Győző |
Author_xml | – sequence: 1 fullname: Laki, László János – sequence: 2 fullname: Yang, Zijian Győző |
BookMark | eNpFkE1LAzEQhoNUsNaePQkLXrxsO0k2yeYoxS8oetFzSNOJbtkmNdk9-O_dtQVPMzDP-w48l2QSYkBCriksqBByyUCyBQPGFgBCyTMyZUJCSYHKybgLXdYM6AWZ57wDAFpLJmoxJXev2CfbFnvrvpqARZdsyK3tmhgKH1Px3IdPmxobrsi5t23G-WnOyMfjw_vquVy_Pb2s7tel45x1JUdQXgKwjWaKe029xEpx3Cq3kYLXG8-E91QP1HDm2qFylcPKe18jbIHPyO2x95Did4-5M7vYpzC8NEwprSspJR2o5ZFyKeac0JtDavY2_RgKZjRiRiNmNGL-jAyJm1MCMbb_pRRqpUXFfwEBbVzP |
CitedBy_id | crossref_primary_10_1145_3665244 |
Cites_doi | 10.1162/tacl_a_00343 10.1162/tacl_a_00300 10.1515/pralin-2017-0003 10.18653/v1/2021.findings-acl.304 10.1162/tacl_a_00288 10.1075/cilt.292.32var |
ContentType | Journal Article |
Copyright | Copyright Akademiai Kiado Zrt Dec 2022 |
Copyright_xml | – notice: Copyright Akademiai Kiado Zrt Dec 2022 |
DBID | AE2 BIXPP REL AAYXX CITATION 7T9 8BM |
DOI | 10.1556/2062.2022.00576 |
DatabaseName | Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen) CEEOL: Open Access Central and Eastern European Online Library - CEEOL Journals CrossRef Linguistics and Language Behavior Abstracts (LLBA) ComDisDome |
DatabaseTitle | CrossRef Linguistics and Language Behavior Abstracts (LLBA) ComDisDome |
DatabaseTitleList | Linguistics and Language Behavior Abstracts (LLBA) CrossRef |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Languages & Literatures |
DocumentTitleAlternate | Neural machine translation for Hungarian |
EISSN | 2560-1016 |
EndPage | 520 |
ExternalDocumentID | 10_1556_2062_2022_00576 1087954 |
GroupedDBID | ABBHK ABDBF ABXSQ ACGFS ACHDO ADACV ADULT AE2 AEHFS AELHJ AEUPB ALMA_UNASSIGNED_HOLDINGS BIXPP EBS EJD IPSME JAAYA JBMMH JENOY JHFFW JKQEH JLEZI JLXEF JPL JSODD JST REL RKA SA0 AAYXX CITATION 7T9 8BM |
ID | FETCH-LOGICAL-c332t-3e07f6002b9273f91f6e473ed7cb6538bf25ff1907f73f39ce7c4ce4fff8e0d03 |
ISSN | 2559-8201 |
IngestDate | Thu Oct 10 16:47:06 EDT 2024 Fri Aug 23 00:27:59 EDT 2024 Tue Oct 29 22:23:56 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | Marian NMT M2M100 mBART neural machine translation BART mT5 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c332t-3e07f6002b9273f91f6e473ed7cb6538bf25ff1907f73f39ce7c4ce4fff8e0d03 |
ORCID | 0000-0001-9955-860X 0000-0003-4958-7968 |
OpenAccessLink | https://www.ceeol.com//search/article-detail?id=1087954 |
PQID | 2779946661 |
PQPubID | 2038400 |
PageCount | 20 |
ParticipantIDs | proquest_journals_2779946661 ceeol_journals_1087954 crossref_primary_10_1556_2062_2022_00576 |
PublicationCentury | 2000 |
PublicationDate | 2022-12-12 |
PublicationDateYYYYMMDD | 2022-12-12 |
PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-12 day: 12 |
PublicationDecade | 2020 |
PublicationPlace | Budapest |
PublicationPlace_xml | – name: Budapest |
PublicationTitle | Acta linguistica academica |
PublicationTitleAlternate | Acta Linguistica Academica An International Journal of Linguistics (Until 2016 Acta Linguistica Hungarica) |
PublicationYear | 2022 |
Publisher | Akadémiai Kiadó Academic Publishing House Akademiai Kiado Zrt |
Publisher_xml | – name: Akadémiai Kiadó – name: Academic Publishing House – name: Akademiai Kiado Zrt |
References | Post, Matt (B23) 2018 Kim, Yoon (B11) 2014 Rescigno, Argentina Anna (B27) 2020 Papineni, Kishore (B21) 2002 Xue, Linting (B39) 2021 Tiedemann, Jörg (B32) 2020 Miculicich, Lesly (B19) 2018 Rajpurkar, Pranav (B26) 2016 Junczys-Dowmunt, Marcin (B10) 2018 Cho, Kyunghyun (B5) 2014 Tang, Yuqing (B31) 2021 Devlin, Jacob (B7) 2019 Hu, Junjie (B8) 2020 Kúdela, Jakub (B12) 2017; 107 Merity, Stephen (B18) 2017 Wu, Yonghui (B38) 2016 Liu, Yinhan (B16) 2020; 8 Wenzek, Guillaume (B37) 2020 Artetxe, Mikel (B2) 2019; 7 Barrault, Loïc (B3) 2019 Conneau, Alexis (B6) 2020 Joshi, Mandar (B9) 2020; 8 Kudo, Taku (B13) 2018 Bucila, Cristian (B4) 2006 Prokhorenkova, Liudmila (B24) 2018 Tang, Yuqing. (B30) 2020 Lewis, Mike (B14) 2020 Raffel, Colin (B25) 2020; 21 Aharoni, Roee (B1) 2019 Maučec, Mirjam Sepesy (B17) 2019 Sun, Meng (B29) 2019 Nemeskey, Dávid Márk (B20) 2020 Tiedemann, Jörg (B33) 2012 Popović, Maja (B22) 2015 Wang, Alex (B36) 2018 Varga, Dániel (B34) 2007 Sennrich, Rico (B28) 2016 Li, Liangyou (B15) 2019 Vaswani, Ashish (B35) 2017; 30 |
References_xml | – year: 2016 ident: B38 article-title: Google's neural machine translation system: Bridging the gap between human and machine translation contributor: fullname: Wu, Yonghui – start-page: 483 year: 2021 ident: B39 article-title: mT5: A massively multilingual pre-trained text-to-text transformer contributor: fullname: Xue, Linting – start-page: 3874 year: 2019 ident: B1 article-title: Massively multilingual neural machine translation contributor: fullname: Aharoni, Roee – start-page: 353 year: 2018 ident: B36 article-title: Glue: A multi-task benchmark and analysis platform for natural language understanding contributor: fullname: Wang, Alex – start-page: 4411 year: 2020 ident: B8 article-title: XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation contributor: fullname: Hu, Junjie – volume: 8 start-page: 726 issue: 11 year: 2020 ident: B16 article-title: Multilingual denoising pre-training for neural machine translation doi: 10.1162/tacl_a_00343 contributor: fullname: Liu, Yinhan – start-page: 4171 year: 2019 ident: B7 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding contributor: fullname: Devlin, Jacob – start-page: 1 year: 2019 ident: B3 article-title: Findings of the 2019 Conference on Machine Translation (WMT19) contributor: fullname: Barrault, Loïc – volume: 8 start-page: 64 year: 2020 ident: B9 article-title: SpanBERT: Improving pre-training by representing and predicting spans doi: 10.1162/tacl_a_00300 contributor: fullname: Joshi, Mandar – volume: 30 start-page: 5998 year: 2017 ident: B35 article-title: Attention is all you need contributor: fullname: Vaswani, Ashish – volume: 107 start-page: 39 issue: 1 year: 2017 ident: B12 article-title: Extracting parallel paragraphs from common crawl doi: 10.1515/pralin-2017-0003 contributor: fullname: Kúdela, Jakub – start-page: 374 year: 2019 ident: B29 article-title: Baidu neural machine translation systems for WMT19 contributor: fullname: Sun, Meng – start-page: 311 year: 2002 ident: B21 article-title: Bleu: A method for automatic evaluation of machine translation contributor: fullname: Papineni, Kishore – volume: 21 start-page: 1 issue: 140 year: 2020 ident: B25 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer contributor: fullname: Raffel, Colin – start-page: 1715 year: 2016 ident: B28 article-title: Neural machine translation of rare words with subword units contributor: fullname: Sennrich, Rico – start-page: 4003 year: 2020 ident: B37 article-title: CCNet: Extracting high quality monolingual datasets from web crawl data contributor: fullname: Wenzek, Guillaume – start-page: 186 year: 2018 ident: B23 article-title: A call for clarity in reporting BLEU scores contributor: fullname: Post, Matt – start-page: 143 year: 2019 ident: B17 article-title: Machine translation and the evaluation of its quality contributor: fullname: Maučec, Mirjam Sepesy – start-page: 6639 year: 2018 ident: B24 article-title: CatBoost: Unbiased boosting with categorical features contributor: fullname: Prokhorenkova, Liudmila – year: 2019 ident: B15 article-title: Pretrained language models for document-level neural machine translation contributor: fullname: Li, Liangyou – start-page: 62 year: 2020 ident: B27 article-title: A case study of natural gender phenomena in translation: A comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish contributor: fullname: Rescigno, Argentina Anna – year: 2017 ident: B18 article-title: Pointer sentinel mixture models contributor: fullname: Merity, Stephen – start-page: 392 year: 2015 ident: B22 article-title: chrF: character n-gram F-score for automatic MT evaluation contributor: fullname: Popović, Maja – start-page: 66 year: 2018 ident: B13 article-title: SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing contributor: fullname: Kudo, Taku – year: 2020 ident: B30 article-title: Multilingual translation with extensible multilingual pretraining and finetuning contributor: fullname: Tang, Yuqing. – start-page: 7871 year: 2020 ident: B14 article-title: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension contributor: fullname: Lewis, Mike – start-page: 8440 year: 2020 ident: B6 article-title: Unsupervised cross-lingual representation learning at scale contributor: fullname: Conneau, Alexis – start-page: 3450 year: 2021 ident: B31 article-title: Multilingual translation from denoising pre-training doi: 10.18653/v1/2021.findings-acl.304 contributor: fullname: Tang, Yuqing – start-page: 2214 year: 2012 ident: B33 article-title: Parallel data, tools and interfaces in OPUS contributor: fullname: Tiedemann, Jörg – start-page: 1746 year: 2014 ident: B11 article-title: Convolutional neural networks for sentence classification contributor: fullname: Kim, Yoon – volume: 7 start-page: 597 year: 2019 ident: B2 article-title: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond doi: 10.1162/tacl_a_00288 contributor: fullname: Artetxe, Mikel – year: 2020 ident: B20 contributor: fullname: Nemeskey, Dávid Márk – start-page: 2947 year: 2018 ident: B19 article-title: Document-level neural machine translation with hierarchical attention networks contributor: fullname: Miculicich, Lesly – start-page: 479 year: 2020 ident: B32 article-title: OPUS-MT — Building open translation services for the World contributor: fullname: Tiedemann, Jörg – start-page: 116 year: 2018 ident: B10 article-title: Marian: Fast neural machine translation in C++ contributor: fullname: Junczys-Dowmunt, Marcin – start-page: 1724 year: 2014 ident: B5 article-title: Learning phrase representations using RNN encoder–decoder for statistical machine translation contributor: fullname: Cho, Kyunghyun – start-page: 535 year: 2006 ident: B4 article-title: Model compression contributor: fullname: Bucila, Cristian – start-page: 2383 year: 2016 ident: B26 article-title: SQuAD: 100,000+ questions for machine comprehension of text contributor: fullname: Rajpurkar, Pranav – start-page: 247 year: 2007 ident: B34 article-title: Parallel corpora for medium density languages doi: 10.1075/cilt.292.32var contributor: fullname: Varga, Dániel |
SSID | ssj0001862585 |
Score | 2.2815099 |
Snippet | In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance on the... Abstract In the scope of this research, we aim to give an overview of the currently existing solutions for machine translation and we assess their performance... |
SourceID | proquest crossref ceeol |
SourceType | Aggregation Database Publisher |
StartPage | 501 |
SubjectTerms | English language Hungarian language Machine translation Translation Studies |
Title | Neural machine translation for Hungarian |
URI | https://www.ceeol.com//search/article-detail?id=1087954 https://www.proquest.com/docview/2779946661 |
Volume | 69 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagvXBBvAoLBeVAUaWVIfGTHHcLZVUWTq1UuESJY0tbliwi6YH99czY3mRbQAIueTiKE82Mx9-MPTOEPDeMOyaZobpOKyoqllGYhg1VTEjA59YJn0vvw0c1OxMn5_J8qLHpo0u66qVZ_zau5H-4Cm3AV4yS_QfO9p1CA1wDf-EIHIbjX_EYM2tg9IffEGmx3EPTLofdgzMYyGAKR_5vUs2arhxjDPqlT9Fcjsu4Q77Xz_MylLKe-0X0rF0v8WLKxyehoVn1OPxT9DZ_Xlygonj34-BIHuTZOpy2PQrMFzfJBvtz8gWEy6_Sf12Ui_H7Rbid8kE1oR1CETuEWSS2qZSiL2Bbt4YyLFGGxJailJt3w50PiPtVnUuJngWWKgyZY5haVepribOjJYM108VNsstA34Ci251M30yPB2cb2G3Sl2ft_zymeYIvvLrWPyZrtXa1vIpUrk7UHn2c3iG3o9mQTIIM3CU3bHOPPJxHZ3ObvEjmfX7s9j45DJKRRMlItiQjAclIesl4QM6O354ezWisiUEN56yj3Kba4VpqlQPwdHnmlBWa21qbSsHkVcHIcw5QnnbwGEad1UYYK5xzr21ap3yP7DSrxj4iictrJRxnOlW1sKkC4FanBhBlxbUpmR2RPU-HIkp8W0Qyj8jhhjDFt5ATpUBbEmhZIC0LpGXhaTki-xvCDd0wrXMsa6Cyx3_6xBNya5DMfbLTfb-0TwH9ddWzyNufxHJPFw |
link.rule.ids | 315,783,787,27936,27937 |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Neural+machine+translation+for+Hungarian&rft.jtitle=Acta+linguistica+academica&rft.au=Laki%2C+L%C3%A1szl%C3%B3+J%C3%A1nos&rft.au=Yang%2C+Zijian+Gy%C5%91z%C5%91&rft.date=2022-12-12&rft.pub=Akad%C3%A9miai+Kiad%C3%B3&rft.issn=2559-8201&rft.eissn=2560-1016&rft.volume=69&rft.issue=4&rft.spage=501&rft.epage=520&rft_id=info:doi/10.1556%2F2062.2022.00576&rft.externalDocID=1087954 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.ceeol.com%2F%2Fapi%2Fimage%2Fgetissuecoverimage%3Fid%3Dpicture_2022_71143.jpg |