Performance of intraclass correlation coefficient (ICC) as a reliability index under various distributions in scale reliability studies

Many published scale validation studies determine inter‐rater reliability using the intra‐class correlation coefficient (ICC). However, the use of this statistic must consider its advantages, limitations, and applicability. This paper evaluates how interaction of subject distribution, sample size, a...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 37; no. 18; pp. 2734 - 2752
Main Authors	Mehta, Shraddha, Bastero‐Caballero, Rowena F., Sun, Yijun, Zhu, Ray, Murphy, Diane K., Hardas, Bhushan, Koch, Gary
Format	Journal Article
Language	English
Published	England Wiley Subscription Services, Inc 15.08.2018 John Wiley and Sons Inc
Subjects	aesthetics Bias Computer Simulation Humans intra‐class correlation Medical research Observer Variation reliability Reproducibility of Results Sample Size scales subject distribution intra-class correlation aesthetics sample size reliability scales subject distribution
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many published scale validation studies determine inter‐rater reliability using the intra‐class correlation coefficient (ICC). However, the use of this statistic must consider its advantages, limitations, and applicability. This paper evaluates how interaction of subject distribution, sample size, and levels of rater disagreement affects ICC and provides an approach for obtaining relevant ICC estimates under suboptimal conditions. Simulation results suggest that for a fixed number of subjects, ICC from the convex distribution is smaller than ICC for the uniform distribution, which in turn is smaller than ICC for the concave distribution. The variance component estimates also show that the dissimilarity of ICC among distributions is attributed to the study design (ie, distribution of subjects) component of subject variability and not the scale quality component of rater error variability. The dependency of ICC on the distribution of subjects makes it difficult to compare results across reliability studies. Hence, it is proposed that reliability studies should be designed using a uniform distribution of subjects because of the standardization it provides for representing objective disagreement. In the absence of uniform distribution, a sampling method is proposed to reduce the non‐uniformity. In addition, as expected, high levels of disagreement result in low ICC, and when the type of distribution is fixed, any increase in the number of subjects beyond a moderately large specification such as n = 80 does not have a major impact on ICC.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 The copyright line for this article was changed on 10 September 2018 after original online publication.
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.7679