Õppijasõbralik korpuslause: automaatse valiku võimalusi

The paper presents how corpus sentences can be used in learners’ lexicography and in data-driven language learning. Tere are two methods for the automatic selection of corpus sentences suitable for language learners: machine learning methods and rule-based methods. The paper focuses on the rule-base...

Full description

Saved in:

Bibliographic Details
Published in	Lähivertailuja Vol. 26; no. 26; pp. 222 - 250
Main Authors	Kallas, Jelena, Koppel, Kristina
Format	Journal Article
Language	Estonian Finnish English
Published	Tallinn Eesti Rakenduslingvistika Ühing (ERÜ) 01.01.2016 Estonian Association for Applied Linguistics (EAAL) Estonian Association for Applied Linguistics Eesti Rakenduslingvistika Ühing = Estonian Association for Applied Linguistics
Subjects	corpus lexicography Corpus linguistics Dictionaries Estonian Estonian language Finno-Ugrian studies Foreign languages learning language learning learners’ lexicography Lexicography Obscenities Philology Sentences Slang Theoretical Linguistics Word length corpus linguistics language learning learners’ lexicography corpus lexicography Estonian
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The paper presents how corpus sentences can be used in learners’ lexicography and in data-driven language learning. Tere are two methods for the automatic selection of corpus sentences suitable for language learners: machine learning methods and rule-based methods. The paper focuses on the rule-based methods and describes them through the example of a tool called GDEX (Good Dictionary Example) (Kilgarriff et al. 2008). GDEX helps automatically select sentences suitable for language learners. It takes into account certain parameters: sentence and word length, threshold of Low-frequency words, keyword position, the absence and presence of certain words etc. .e paper introduces the parameters of Estonian GDEX configuration and discusses which parameters need to be studied further. The paper also introduces the new corpus Estonian NC GDEX, aimed at language learners. The corpus contains only sentences that meet the requirements for Estonian GDEX con"generation. In the sentences, there are no low-frequency words, vocabulary is controlled (no slang, vulgarisms or profanities occur), and all sentences are full sentences and contain verbs. At the moment, the new corpora are accessible only in the corpus query system Sketch Engine (Kilgarriff et al. 2004). In future, it will be possible to integrate it into dictionary portals aimed at language learners.
ISSN:	1736-9290 2228-3854
DOI:	10.5128/LV26.07