Leksikograafide ja keeleõppijate hinnangud automaatselt tuvastatud korpuslausete sobivusele õppesõnastiku näitelauseks

This paper reports on an assessment task carried out among students of Tallinn University and the University of Tartu, who speak Estonian at B2-C1 proficiency level, and among lexicographers working at the Institute of the Estonian Language. Te purpose of the task was to determine whether, according...

Full description

Saved in:
Bibliographic Details
Published inLähivertailuja Vol. 29; no. 29; pp. 84 - 112
Main Author Koppel, Kristina
Format Journal Article
LanguageEstonian
English
Published Eesti Rakenduslingvistika Ühing (ERÜ) 31.10.2019
Estonian Association for Applied Linguistics (EAAL)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper reports on an assessment task carried out among students of Tallinn University and the University of Tartu, who speak Estonian at B2-C1 proficiency level, and among lexicographers working at the Institute of the Estonian Language. Te purpose of the task was to determine whether, according to the above two types of annotators, authentic and unedited corpus sentences would be suitable as example sentences for learners’ dictionaries on B2-C1 level. Te results of the assessment task were also to help evaluate the output of version 1.4 of the Estonian module of GDEX (GDEX 1.4) used to choose and display web sentences in the Institute’s new language portal Sõnaveeb. GDEX (Good Dictionary Example) is a function of the corpus query system Sketch Engine, designed to find optimal example sentence candidates from large corpora. Te results of the assessment task confirmed three hypotheses: 1) Before displaying authentic corpus sentences to end-users, a filtering of corpus sentences is necessary; 2) GDEX 1.4 can identify good example candidates from corpora and filter out inappropriate candidates; 3) example sentences compiled by lexicographers are suitable example sentences. Both types of annotators considered as many as 96% of the dictionary examples to be suitable example sentences and 85% of corpus sentences chosen as good examples by GDEX 1.4. Only 6% of the sentences that were discarded by GDEX 1.4 were considered as suitable, meaning that 94% of the bad candidates had been filtered out successfully. As for unfiltered corpus sentences, 60% of those were considered unsuitable. When asking for the annotators’ reasons for considering a sentence unsuitable, the most common arguments were that the sentences include anaphora and hence need more context, or that the sentences are colloquial, too long or too short.
ISSN:1736-9290
2228-3854
DOI:10.5128/LV29.03