Spoken Indian Language Identification using MFCC and Vowel Onset Points

Conversation between individuals encompasses information on the speakers, languages, and contents. By taking the language-specific information out of the speech, the language of the utterance can be efficiently ascertained. Language Identification, or LID is the approach aimed at identifying spoken...

Full description

Saved in:
Bibliographic Details
Published in2023 9th International Conference on Smart Computing and Communications (ICSCC) pp. 150 - 155
Main Authors Siyad, Hajara Muhammed, George, Anu
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.08.2023
Subjects
Online AccessGet full text
DOI10.1109/ICSCC59169.2023.10335007

Cover

More Information
Summary:Conversation between individuals encompasses information on the speakers, languages, and contents. By taking the language-specific information out of the speech, the language of the utterance can be efficiently ascertained. Language Identification, or LID is the approach aimed at identifying spoken language. Speech recognition, speech translation, and voice-activated automatic systems all reap the benefits of having the knack to identify languages from speech. The LID system may be paramount for recognizing speakers since it can narrow down the search field by identifying languages. The Random Forest (RF) classification methodology has been implemented in this research to put forward an approach for language identification that relies on Vowel Onset Points (VOPs) and Mel Frequency Cepstral Coefficients (MFCCs) characteristics. Vocal tract features comprise VOP and MFCC. Language and speaker-related information are included in the VOP and MFCC characteristics extracted from spoken words that are spoken. Extraction of language-specific characteristics is a prerequisite for language identification. Therefore, the combination of these characteristics surpasses each one separately. The database of multilingual clean speech signals (Hindi, Assamese, and Malayalam) that was received from the IITKGP dataset has been the subject of the experiment. A random selection of 835 voice signals from the aforementioned database has been employed in the proposed model. Eighty percent of the 835 speech samples were utilized for training, while twenty percent from the same dataset were used to assess the suggested models. MFFCs characteristics and combined features are used to evaluate language models. When compared to taking MFFC features one at a time, experiments that employ both features concurrently get more accurate outcomes. Additionally, we compare the performance of the random forest and K nearest neighbor classifiers in the same convolved file. The proposed method incorporates an amalgam of the same features and a random forest model is used to increase language identification accuracy to 84.1% while the KNN gives only 71.9% accuracy.
DOI:10.1109/ICSCC59169.2023.10335007