Towards a frequency normalization of CREA and CORDE corpora
CORDE (Corpus Diacrónico del Español) and CREA (Corpus de Referencia del Español Actual) are two of the largest and most frequently used databases in the study of the Spanish language. However, they have some limitations in terms of size, sample unit and representativeness that may influence the res...
Saved in:
Published in | Revista signos Vol. 48; no. 89; p. 307 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | Spanish |
Published |
Valparaíso
Dr. Giovanni Parodi
01.12.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | CORDE (Corpus Diacrónico del Español) and CREA (Corpus de Referencia del Español Actual) are two of the largest and most frequently used databases in the study of the Spanish language. However, they have some limitations in terms of size, sample unit and representativeness that may influence the results of studies and descriptions of linguistic phenomena. In this paper we identify these limitations and propose a method for the normalization of document frequencies by computing moving averages. We show how this method allows for a more realistic interpretation of corpus data and, thus, a more effective use of these resources. |
---|---|
ISSN: | 0035-0451 0718-0934 |
DOI: | 10.4067/S0718-09342015000300002 |