How to Estimate Intraclass Correlation Coefficients for Interrater Reliability from Planned Incomplete Data

The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an...

Full description

Saved in:
Bibliographic Details
Published inMultivariate behavioral research pp. 1 - 20
Main Authors Ten Hove, Debby, Jorgensen, Terrence D., Van der Ark, L. Andries
Format Journal Article
LanguageEnglish
Published United States 16.06.2025
Subjects
Online AccessGet full text
ISSN0027-3171
1532-7906
1532-7906
DOI10.1080/00273171.2025.2507745

Cover

Loading…
Abstract The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided.
AbstractList The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided.
The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided.The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided.
Author Ten Hove, Debby
Van der Ark, L. Andries
Jorgensen, Terrence D.
Author_xml – sequence: 1
  givenname: Debby
  orcidid: 0000-0002-1335-4452
  surname: Ten Hove
  fullname: Ten Hove, Debby
– sequence: 2
  givenname: Terrence D.
  orcidid: 0000-0001-5111-6773
  surname: Jorgensen
  fullname: Jorgensen, Terrence D.
– sequence: 3
  givenname: L. Andries
  orcidid: 0000-0003-3131-7943
  surname: Van der Ark
  fullname: Van der Ark, L. Andries
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40524384$$D View this record in MEDLINE/PubMed
BookMark eNo90U1PwzAMBuAIgdgH_ARQj1w6nKRpmyMag02aBEJwjtLUlQptM5JMaP-eVBuc7MNjy_I7I-eDHZCQGwoLCiXcA7CC04IuGDCxYAKKIhNnZEoFZ2khIT8n09GkI5qQmfefAJCLTF6SSQaCZbzMpuRrbX-SYJOVD22vAyabIThtOu19srTOYadDa4fYY9O0psUh-KSxbnToXJxwyRt2ra7arg2HpHG2T147PQxYR2Nsv-swrn3UQV-Ri0Z3Hq9PdU4-nlbvy3W6fXneLB-2qeGQh7QQWJQ1qysELZvclJI2pZScGi1MzXTFJeWm1jXVFCEzVY4io6UwVOQGypzPyd1x787Z7z36oPrWG-ziVWj3XnFGJWciFzLS2xPdVz3WaufiF9xB_T0oAnEExlnvHTb_hIIag1B_QagxCHUKgv8CroB7dA
Cites_doi 10.1080/01621459.2021.1874961
10.1037/met0000107
10.1002/9780470594001
10.1027/1614-2241/a000149
10.1037/met0000177
10.1207/S15328007SEM1001_4
10.1177/014662102237794
10.2307/2530013
10.2466/pr0.1966.19.1.3
10.1007/s12671-020-01533-0
10.1037/1082-989X.11.4.323
10.1080/00031305.1992.10475842
10.3390/psych3020011
10.1177/0163278718759396
10.1201/b16018
10.1016/j.jpainsymman.2019.09.001
10.2307/3001853
10.1080/09669760.2022.2091981
10.1002/9781118619179
10.1016/S0169-7161(06)26004-8
10.18637/jss.v080.i01
10.1007/978-1-4757-3456-0
10.1177/0149206314554215
10.1214/ss/1177011136
10.2466/pr0.1990.66.2.379
10.1080/10705519609540045
10.18637/jss.v067.i01
10.1037/met0000391
10.1177/00131644211033899
10.1016/j.jsp.2009.10.001
10.1111/infa.12125
10.1080/09243453.2018.1539015
10.1037/0033-2909.86.2.420
10.1007/s11336-008-9099-3
10.1037/met0000516
10.17605/OSF.IO/TMD3X
10.1177/2059799118791397
10.1007/978-3-030-43469-4_7
10.1111/j.2044-8317.1963.tb00206.x
10.1037/1082-989X.1.1.30
10.1007/978-3-031-27781-8_1
10.1080/00223891.2020.1808474
10.18637/jss.v087.c01
10.3758/s13428-017-0986-3
10.1093/biomet/63.3.581
10.18637/jss.v048.i02
10.1080/00273171.2021.1891855
10.1080/19312458.2012.679848
10.1086/708661
10.1207/s15327906mbr3102_3
10.1037/0021-9010.93.5.959
10.1037/10409-011
10.1191/0962280206sm448oa
ContentType Journal Article
DBID AAYXX
CITATION
NPM
7X8
DOI 10.1080/00273171.2025.2507745
DatabaseName CrossRef
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Psychology
EISSN 1532-7906
EndPage 20
ExternalDocumentID 40524384
10_1080_00273171_2025_2507745
Genre Journal Article
GroupedDBID --Z
-~X
.7I
.QK
0BK
0R~
123
4.4
5VS
8VB
AAGDL
AAGZJ
AAHIA
AAMFJ
AAMIU
AAPUL
AATTQ
AAYXX
AAZMC
ABCCY
ABFIM
ABIVO
ABJNI
ABLIJ
ABLJU
ABPEM
ABPPZ
ABRYG
ABTAI
ABXUL
ABXYU
ABZLS
ACGFS
ACHQT
ACIWK
ACNCT
ACTIO
ACTOA
ADAHI
ADCVX
ADKVQ
ADYSH
AECIN
AEFOU
AEISY
AEKEX
AENEX
AEOZL
AEPSL
AEYOC
AEZRU
AFHDM
AFRVT
AGDLA
AGMYJ
AGRBW
AHDZW
AIJEM
AIYEW
AJWEG
AKBVH
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AMPGV
AVBZW
AWYRJ
BEJHT
BLEHA
BMOTO
BOHLJ
CCCUG
CITATION
CQ1
CS3
DKSSO
DU5
EBS
EMOBN
E~B
E~C
F5P
FEDTE
G-F
GTTXZ
H13
HF~
HZ~
IPNFZ
J.O
KYCEM
LJTGL
M4Z
MS~
NA5
O9-
P2P
PQQKQ
QWB
RIG
RNANH
ROSJB
RSYQP
S-F
STATR
TBQAZ
TDBHL
TEH
TFH
TFL
TFW
TN5
TNTFI
TRJHH
TUROJ
TWZ
UT5
UT9
VAE
WH7
YNT
YQT
ZL0
~01
~S~
NPM
TASJS
7X8
ID FETCH-LOGICAL-c306t-75e78d2dbe0a9f6c891f89931ca5cd2ab3913cdad1a1e04cb6e54185c156c0863
ISSN 0027-3171
1532-7906
IngestDate Wed Jul 02 02:41:19 EDT 2025
Mon Jul 21 06:04:29 EDT 2025
Thu Jul 03 08:20:47 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords simulation
planned-missing designs
interrater reliability
observational research
Generalizability theory
incomplete data
intraclass correlation coefficients
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-75e78d2dbe0a9f6c891f89931ca5cd2ab3913cdad1a1e04cb6e54185c156c0863
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-5111-6773
0000-0003-3131-7943
0000-0002-1335-4452
OpenAccessLink https://research.vu.nl/en/publications/3d463001-af23-4a72-b8f6-ecd4bc219792
PMID 40524384
PQID 3219325659
PQPubID 23479
PageCount 20
ParticipantIDs proquest_miscellaneous_3219325659
pubmed_primary_40524384
crossref_primary_10_1080_00273171_2025_2507745
PublicationCentury 2000
PublicationDate 2025-Jun-16
PublicationDateYYYYMMDD 2025-06-16
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-Jun-16
  day: 16
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Multivariate behavioral research
PublicationTitleAlternate Multivariate Behav Res
PublicationYear 2025
References e_1_3_2_28_1
e_1_3_2_49_1
e_1_3_2_20_1
e_1_3_2_41_1
e_1_3_2_22_1
e_1_3_2_43_1
e_1_3_2_24_1
e_1_3_2_45_1
e_1_3_2_26_1
e_1_3_2_47_1
Fox J. (e_1_3_2_13_1) 2019
e_1_3_2_62_1
e_1_3_2_60_1
Brennan R. L. (e_1_3_2_9_1) 2001
e_1_3_2_16_1
e_1_3_2_39_1
e_1_3_2_18_1
e_1_3_2_7_1
e_1_3_2_31_1
e_1_3_2_54_1
e_1_3_2_10_1
e_1_3_2_33_1
e_1_3_2_52_1
e_1_3_2_12_1
e_1_3_2_35_1
e_1_3_2_58_1
e_1_3_2_5_1
e_1_3_2_14_1
e_1_3_2_37_1
e_1_3_2_56_1
e_1_3_2_3_1
e_1_3_2_50_1
Lord F. M. (e_1_3_2_32_1) 1968
R Core Team (e_1_3_2_44_1) 2023
e_1_3_2_27_1
e_1_3_2_29_1
e_1_3_2_42_1
e_1_3_2_21_1
e_1_3_2_63_1
e_1_3_2_23_1
e_1_3_2_46_1
e_1_3_2_25_1
e_1_3_2_48_1
e_1_3_2_61_1
e_1_3_2_40_1
e_1_3_2_17_1
e_1_3_2_38_1
e_1_3_2_19_1
e_1_3_2_30_1
e_1_3_2_55_1
e_1_3_2_11_1
e_1_3_2_53_1
e_1_3_2_6_1
e_1_3_2_34_1
e_1_3_2_59_1
e_1_3_2_4_1
e_1_3_2_36_1
e_1_3_2_57_1
Bollen K. A. (e_1_3_2_8_1) 1989
e_1_3_2_51_1
Agresti A. (e_1_3_2_2_1) 2010
Gelman A. (e_1_3_2_15_1) 2013
References_xml – ident: e_1_3_2_37_1
  doi: 10.1080/01621459.2021.1874961
– ident: e_1_3_2_30_1
– ident: e_1_3_2_56_1
  doi: 10.1037/met0000107
– volume-title: Analysis of ordinal categorical data
  year: 2010
  ident: e_1_3_2_2_1
  doi: 10.1002/9780470594001
– ident: e_1_3_2_14_1
– ident: e_1_3_2_25_1
  doi: 10.1027/1614-2241/a000149
– ident: e_1_3_2_57_1
  doi: 10.1037/met0000177
– ident: e_1_3_2_17_1
  doi: 10.1207/S15328007SEM1001_4
– ident: e_1_3_2_7_1
  doi: 10.1177/014662102237794
– ident: e_1_3_2_22_1
  doi: 10.2307/2530013
– ident: e_1_3_2_5_1
  doi: 10.2466/pr0.1966.19.1.3
– ident: e_1_3_2_40_1
  doi: 10.1007/s12671-020-01533-0
– ident: e_1_3_2_20_1
  doi: 10.1037/1082-989X.11.4.323
– ident: e_1_3_2_38_1
  doi: 10.1080/00031305.1992.10475842
– ident: e_1_3_2_29_1
  doi: 10.3390/psych3020011
– ident: e_1_3_2_11_1
  doi: 10.1177/0163278718759396
– volume-title: Bayesian data analysis
  year: 2013
  ident: e_1_3_2_15_1
  doi: 10.1201/b16018
– ident: e_1_3_2_62_1
  doi: 10.1016/j.jpainsymman.2019.09.001
– volume-title: An R companion to applied regression
  year: 2019
  ident: e_1_3_2_13_1
– volume-title: Statistical theories of mental test scores
  year: 1968
  ident: e_1_3_2_32_1
– ident: e_1_3_2_23_1
  doi: 10.2307/3001853
– ident: e_1_3_2_61_1
  doi: 10.1080/09669760.2022.2091981
– volume-title: Structural equations with latent variables
  year: 1989
  ident: e_1_3_2_8_1
  doi: 10.1002/9781118619179
– ident: e_1_3_2_60_1
  doi: 10.1016/S0169-7161(06)26004-8
– ident: e_1_3_2_10_1
  doi: 10.18637/jss.v080.i01
– volume-title: Generalizability theory
  year: 2001
  ident: e_1_3_2_9_1
  doi: 10.1007/978-1-4757-3456-0
– ident: e_1_3_2_31_1
  doi: 10.1177/0149206314554215
– ident: e_1_3_2_48_1
– ident: e_1_3_2_3_1
– ident: e_1_3_2_16_1
  doi: 10.1214/ss/1177011136
– ident: e_1_3_2_34_1
  doi: 10.2466/pr0.1990.66.2.379
– ident: e_1_3_2_35_1
  doi: 10.1080/10705519609540045
– ident: e_1_3_2_6_1
  doi: 10.18637/jss.v067.i01
– ident: e_1_3_2_52_1
  doi: 10.1037/met0000391
– ident: e_1_3_2_26_1
  doi: 10.1177/00131644211033899
– ident: e_1_3_2_4_1
  doi: 10.1016/j.jsp.2009.10.001
– ident: e_1_3_2_33_1
  doi: 10.1111/infa.12125
– ident: e_1_3_2_55_1
  doi: 10.1080/09243453.2018.1539015
– volume-title: R: A language and environment for statistical computing.
  year: 2023
  ident: e_1_3_2_44_1
– ident: e_1_3_2_47_1
  doi: 10.1037/0033-2909.86.2.420
– ident: e_1_3_2_21_1
  doi: 10.1007/s11336-008-9099-3
– ident: e_1_3_2_53_1
  doi: 10.1037/met0000516
– ident: e_1_3_2_51_1
  doi: 10.17605/OSF.IO/TMD3X
– ident: e_1_3_2_28_1
  doi: 10.1177/2059799118791397
– ident: e_1_3_2_50_1
  doi: 10.1007/978-3-030-43469-4_7
– ident: e_1_3_2_12_1
  doi: 10.1111/j.2044-8317.1963.tb00206.x
– ident: e_1_3_2_24_1
– ident: e_1_3_2_36_1
  doi: 10.1037/1082-989X.1.1.30
– ident: e_1_3_2_54_1
  doi: 10.1007/978-3-031-27781-8_1
– ident: e_1_3_2_58_1
  doi: 10.1080/00223891.2020.1808474
– ident: e_1_3_2_59_1
  doi: 10.18637/jss.v087.c01
– ident: e_1_3_2_27_1
  doi: 10.3758/s13428-017-0986-3
– ident: e_1_3_2_46_1
  doi: 10.1093/biomet/63.3.581
– ident: e_1_3_2_45_1
  doi: 10.18637/jss.v048.i02
– ident: e_1_3_2_39_1
  doi: 10.1080/00273171.2021.1891855
– ident: e_1_3_2_42_1
  doi: 10.1080/19312458.2012.679848
– ident: e_1_3_2_63_1
  doi: 10.1086/708661
– ident: e_1_3_2_49_1
– ident: e_1_3_2_18_1
  doi: 10.1207/s15327906mbr3102_3
– ident: e_1_3_2_43_1
  doi: 10.1037/0021-9010.93.5.959
– ident: e_1_3_2_19_1
  doi: 10.1037/10409-011
– ident: e_1_3_2_41_1
  doi: 10.1191/0962280206sm448oa
SSID ssj0006549
Score 2.409329
SecondaryResourceType online_first
Snippet The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR...
SourceID proquest
pubmed
crossref
SourceType Aggregation Database
Index Database
StartPage 1
Title How to Estimate Intraclass Correlation Coefficients for Interrater Reliability from Planned Incomplete Data
URI https://www.ncbi.nlm.nih.gov/pubmed/40524384
https://www.proquest.com/docview/3219325659
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Na9wwEBXtBspeStOvbNsEFXpbvFi2ZK-PS5uylLSnTRt6MZIsX0rt4DqB5NdnRh-2WzbQ9mKMvMigeZLnad88EfLOyKxiuk6jmhdxxGtuIqU4j9ZcCJkymeXW1Ofzl2x7zj9diItRP2-rS3q10rd760r-J6rQBnHFKtl_iOzQKTTAPcQXrhBhuP5VjO15cO3yFKYpJJ5WBdlJjfkwzPOu8zo3uDfWKMIWs6Gs0G4DokdEZyXJzqr7xpWa2FOMDGqErdoccmpARi-nSawt2r0Gko3vnNT5e-OgYYN5Z5rltr02fl1TE7kOlnz6vZ-d6ZzT7aA9_ipRodstN07GfWZVl0Hs6HcoEoFKKub9rcOqCml8EU9XSrZ3_Q6CR0iqWI70PRErSNIgRxXT30MYLn_aoEK-mfDUnTL3h3F2ePSQHCTAIZIZOdhsP3z_NnyoM-DGoagL7db3vXVOHoV-fs9c7qEjNi3ZPSGPPZ-gGweOQ_LANE_JfPis3TwjPwAltG9pQAkdUUInKKFTlFBACR1RQicooYgS6lFCR5RQRMlzcv7xdPd-G_kjNiINXLGPcmHydZVUysSyqDO9LlgNDDxlWgpdJVKlBUt1JSsmmYm5VpkRaHekgfZrYMPpCzJr2sYcEapNVeVo_q9gxgslpWDrXALfEHEFTdmCrMLwlZfOSaVkg0GtG_oSh770Q78gb8Mgl7Dm4R9ZsjHt1a8yTZB2ABUpFuSlG_2hyxCtV_c-eU3mI0rfkFnfXZljyCx7deIRcgfH_3if
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+to+Estimate+Intraclass+Correlation+Coefficients+for+Interrater+Reliability+from+Planned+Incomplete+Data&rft.jtitle=Multivariate+behavioral+research&rft.au=Ten+Hove%2C+Debby&rft.au=Jorgensen%2C+Terrence+D&rft.au=Van+der+Ark%2C+L+Andries&rft.date=2025-06-16&rft.eissn=1532-7906&rft.spage=1&rft_id=info:doi/10.1080%2F00273171.2025.2507745&rft_id=info%3Apmid%2F40524384&rft.externalDocID=40524384
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0027-3171&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0027-3171&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0027-3171&client=summon