The complexity of measuring reliability in learning tasks: An illustration using the Alternating Serial Reaction Time Task

Despite the fact that reliability estimation is crucial for robust inference, it is underutilized in neuroscience and cognitive psychology. Appreciating reliability can help researchers increase statistical power, effect sizes, and reproducibility, decrease the impact of measurement error, and infor...

Full description

Saved in:

Bibliographic Details
Published in	Behavior research methods Vol. 56; no. 1; pp. 301 - 317
Main Authors	Farkas, Bence C., Krajcsi, Attila, Janacsek, Karolina, Nemeth, Dezso
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2024 Springer Nature B.V Psychonomic Society, Inc
Subjects	Behavioral Science and Psychology Cognitive ability Cognitive Psychology Humanities and Social Sciences Humans Learning Psychology Psychometrics Reaction Time Reaction time task Reproducibility of Results Response time Cronbach's alpha Reliability Statistical learning Procedural memory Alternating Serial Reaction Time Task Sequence learning Cronbach s alpha
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Despite the fact that reliability estimation is crucial for robust inference, it is underutilized in neuroscience and cognitive psychology. Appreciating reliability can help researchers increase statistical power, effect sizes, and reproducibility, decrease the impact of measurement error, and inform methodological choices. However, accurately calculating reliability for many experimental learning tasks is challenging. In this study, we highlight a number of these issues, and estimate multiple metrics of internal consistency and split-half reliability of a widely used learning task on a large sample of 180 subjects. We show how pre-processing choices, task length, and sample size can affect reliability and its estimation. Our results show that the Alternating Serial Reaction Time Task has respectable reliability, especially when learning scores are calculated based on reaction times and two-stage averaging. We also show that a task length of 25 blocks can be sufficient to meet the usual thresholds for minimally acceptable reliability. We further illustrate how relying on a single point estimate of reliability can be misleading, and the calculation of multiple metrics, along with their uncertainties, can lead to a more complete characterization of the psychometric properties of tasks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 PMCID: PMC10794483
ISSN:	1554-3528 1554-351X 1554-3528
DOI:	10.3758/s13428-022-02038-5