z-squared: The Origin and Application of
A set of statistical tests termed contingency tests, of which χ 2 is the most well-known example, are commonly employed in linguistics research. Contingency tests compare discrete distributions, that is, data divided into two or more alternative categories, such as alternative linguistic choices of...
Saved in:
Published in | Journal of quantitative linguistics Vol. 20; no. 4; pp. 350 - 378 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Routledge
01.11.2013
|
Online Access | Get full text |
Cover
Loading…
Summary: | A set of statistical tests termed contingency tests, of which χ
2
is the most well-known example, are commonly employed in linguistics research. Contingency tests compare discrete distributions, that is, data divided into two or more alternative categories, such as alternative linguistic choices of a speaker or different experimental conditions. These tests are highly ubiquitous, and are part of every linguistics researcher's arsenal. However, the mathematical underpinnings of these tests are rarely discussed in the literature in an approachable way, with the result that many researchers may apply tests inappropriately, fail to see the possibility of testing particular questions, or draw unsound conclusions. Contingency tests are also closely related to the construction of confidence intervals, which are highly useful and revealing methods for plotting the certainty of experimental observations. This paper is organized in the following way. The foundations of the simplest type of χ
2
test, the 2 × 1 goodness of fit test, is introduced and related to the z test for a single observed proportion p and the Wilson score confidence interval about p. We then show how the 2 × 2 test for independence (homogeneity) is derived from two observations p
1
and p
2
and explain when each test should be used. We also briefly introduce the Newcombe-Wilson test, which ideally should be used in preference to the χ test for observations drawn from two independent populations (such as two sub-corpora). We then turn to tests for larger tables, generally termed r × c tests, which have multiple degrees of freedom and therefore may encompass multiple trends, and discuss strategies for their analysis. Finally, we turn briefly to the question of differentiating test results. We introduce the concept of effect size (also termed "measures of association") and finally explain how we may perform statistical separability tests to distinguish between two sets of results. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0929-6174 1744-5035 |
DOI: | 10.1080/09296174.2013.830554 |