Eesti keele kui teise keele õpikute lausete analüüs ja selle rakendamine eri keeleoskustasemete sõnastike näitelausete automaatsel valikul

The aim of the study was to develop new Estonian GDEX configurations for A-, B- and C-language proficiency levels. GDEX (Good Dictionary Example) (Kilgarriff et al. 2008) is a software module of the corpus query system Sketch Engine (Kilgarriff et al. 2004), which helps to identify good dictionary e...

Full description

Saved in:

Bibliographic Details
Published in	Eesti Rakenduslingvistika Ühingu aastaraamat Vol. 15; no. 15; pp. 99 - 119
Main Author	Koppel, Kristina
Format	Journal Article
Language	Estonian English
Published	Eesti Rakenduslingvistika Ühing (ERÜ) 01.05.2019 Estonian Association for Applied Linguistics (EAAL) Eesti Rakenduslingvistika Ühing (Estonian Association for Applied Linguistics)
Subjects	corpora corpus lexicography corpus linguistics Estonian Estonian as a second language Language studies learners’ corpora Lexis corpus linguistics corpora Estonian as a second language learners’ corpora corpus lexicography Estonian
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The aim of the study was to develop new Estonian GDEX configurations for A-, B- and C-language proficiency levels. GDEX (Good Dictionary Example) (Kilgarriff et al. 2008) is a software module of the corpus query system Sketch Engine (Kilgarriff et al. 2004), which helps to identify good dictionary example candidates from large corpora. In order to identify which specific parameters characterise sentences in each proficiency level, full sentences from the Estonian Coursebook Corpus 2018 were analysed using a program called Analyser of Sentence Parameters developed at the Institute of the Estonian Language. The analyser allows to find out how long the sentences and tokens are, what kind of verb forms are used, what syntactic properties the sentences have etc. The analysis showed that compared to the latest Estonian GDEX configuration 1.4 such parameters as sentence and token length, occurrence of certain verb forms and parts of speech needed to be adjusted. Accordingly, for A-level the sentence length was set to 3–14 tokens (optimal interval 4–7 tokens), for B-level 3–18 tokens (optimal interval 4–12) and for C-level 4–23 tokens (optimal interval 6–14 tokens). A new classifier that penalises tokens longer than 9 characters on A-level and tokens longer than 11 characters on B-level was introduced. On A- and B-levels certain verb forms were penalised or banned from appearing in the sentence. etSkELL – a corpus tool for Estonian language learning – and the dictionary portal Sõnaveeb (Wordweb) are introduced as possible ways to implement the new GDEX configurations output. The results of this paper can be applied in compiling corpora and teaching materials for different language proficiency levels.
ISSN:	1736-2563 2228-0677
DOI:	10.5128/ERYa15.06