A Corpus-based Multilingual Comparison of AI-based Machine Translations

The present study aims to investigate whether, and to what extent, the corpus linguistic technique type-token ratio (TTR) is valid in identifying the quality of translation productions produced by different AI-based machine translation (MT) systems. Specifically, this study examined the discourse-le...

Full description

Saved in:
Bibliographic Details
Published inKorea Journal of English Language and Linguistics Vol. 24; pp. 257 - 276
Main Authors Liu, Cuilin, Jhang, Se-Eun, Park, Homin, Hahm, Hyunjong
Format Journal Article
LanguageEnglish
Published 한국영어학회 2024
Subjects
Online AccessGet full text
ISSN1598-1398
2586-7474
DOI10.15738/kjell.24..202404.257

Cover

Loading…
More Information
Summary:The present study aims to investigate whether, and to what extent, the corpus linguistic technique type-token ratio (TTR) is valid in identifying the quality of translation productions produced by different AI-based machine translation (MT) systems. Specifically, this study examined the discourse-level discrepancies of MT outputs generated by Google Translate, DeepL and ChatGPT 3.5 on the discourse level utilizing a self-complied multilingual corpus of English translations for the short story Eveline in Korean and Chinese. For this purpose, we calculated the TTR separately for different text segments within a moving span of running word-tokens and visualized the results with a two-dimensional approach. In addition, to verify the validity of this TTR method in predicting the discrepant qualities of the three MT systems, we took a comprehensive reference of three metrics (Bilingual Evaluation Understudy, BLEU; Metric for Evaluation of Translation with Explicit Ordering, METEOR; Recall-Oriented Understudy for Gisting Evaluation, ROUGE) that are commonly used to evaluate the quality of MTs. The paper demonstrated the validity of TTR graphs in assessing the quality of a particular MT system. The findings corroborate the argument in previous studies that AI-based MT produced less lexical diversity and information density. KCI Citation Count: 0
ISSN:1598-1398
2586-7474
DOI:10.15738/kjell.24..202404.257