Spoken language identification based on the transcript analysis

Abstract Language identification is a great challenge in language engineering, which arises along with the tasks of speech recognition, machine translation, cross-language information retrieval, intelligent dialogue system creation, etc. The presented article introduces the intelligent language iden...

Full description

Saved in:
Bibliographic Details
Published inDigital Scholarship in the Humanities Vol. 38; no. 2; pp. 586 - 595
Main Authors Lande, Dmytro V, Dmytrenko, Olegh O, Shevchenko, Anatolij I, Klymenko, Mykyta S, Vakulenko, Maksym O
Format Journal Article
LanguageEnglish
Published Oxford University Press 31.05.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Language identification is a great challenge in language engineering, which arises along with the tasks of speech recognition, machine translation, cross-language information retrieval, intelligent dialogue system creation, etc. The presented article introduces the intelligent language identification technology, which is based on speech recognition and statistical methods of spectrogram analysis. The approach to the automatic identification of the spoken language sample uploaded to the system, in particular from video streaming services such as YouTube, is put forward. The article focuses on the automatic identification of spoken language, taking into account several speech recognition solutions for correct or incorrect speech recognition and its conversion into correct or incorrect text. The obtained algorithm is demonstrated in the Ukrainian and Russian languages. The identification quality of the language of an utterance, which lasts >30 s is almost 100%, and for the utterance of a duration of 30 s, the quality is 98%, and for the 5-s utterance, it reaches 89.6%. In addition to that, the system performance is contingent on the streaming speed, so it is a real-time system.
ISSN:2055-7671
2055-768X
DOI:10.1093/llc/fqac052