Development of the information system for the Kazakh language preprocessing

The aim of this work is the design and development of linguistic resources and preprocessing tools for the Kazakh language. The media-corpus of the Kazakh language is presented as a linguistic resource, which is available on Al-Farabi Kazakh National University platform. The media-corpus of the Kaza...

Full description

Saved in:
Bibliographic Details
Published inCogent engineering Vol. 8; no. 1
Main Authors Akhmed-Zaki, Darkhan, Mansurova, Madina, Madiyeva, Gulmira, Kadyrbek, Nurgali, Kyrgyzbayeva, Marzhan
Format Journal Article
LanguageEnglish
Published Abingdon Cogent 01.01.2021
Taylor & Francis Ltd
Taylor & Francis Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The aim of this work is the design and development of linguistic resources and preprocessing tools for the Kazakh language. The media-corpus of the Kazakh language is presented as a linguistic resource, which is available on Al-Farabi Kazakh National University platform. The media-corpus of the Kazakh language consists of texts of news content and is implemented as an information system. The general architecture of an information system for the automatic and reliable collection, storage and analysis of texts in the Kazakh language is described. Three automatic text preprocessing tools for the Kazakh language - word forms generator, morphological analyzer, and morphological disambiguation tool - are presented in the article. The proposed tools can also be applied in the systems of automatic analysis of texts, in creation of other linguistic resources such as thesauri and ontologies.
ISSN:2331-1916
2331-1916
DOI:10.1080/23311916.2021.1896418