How to Estimate Intraclass Correlation Coefficients for Interrater Reliability from Planned Incomplete Data
The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an...
Saved in:
Published in | Multivariate behavioral research pp. 1 - 20 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
16.06.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 0027-3171 1532-7906 1532-7906 |
DOI | 10.1080/00273171.2025.2507745 |
Cover
Loading…
Abstract | The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided. |
---|---|
AbstractList | The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided. The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided.The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR estimators that are based on the variance decomposition of scores obtained by observations. ICCs are typically estimated using mean squares from an ANOVA model, the computation of which is not straightforward for incomplete data. However, many studies in behavioral research use planned missing observational designs, in which the raters partially vary across subjects. Planned missing designs result in incomplete data. Therefore, we simulated planned incomplete data and compared the computational accuracy (bias of point estimates, bias of variability estimates, root mean squared error, and coverage rates) and computational feasibility (convergence rates and estimation time) of three recently proposed estimation methods for ICCs: Markov chain Monte Carlo estimation of Bayesian hierarchical linear models, maximum likelihood estimation of random-effects models, and maximum likelihood estimation of common-factor models. Maximum likelihood estimation of random-effects models with Monte-Carlo confidence intervals is preferred based on all criteria. This article is accompanied by R code, which enables researchers to apply these estimation methods. A demonstration of the R code to a real-data set from an educational context is provided. |
Author | Ten Hove, Debby Van der Ark, L. Andries Jorgensen, Terrence D. |
Author_xml | – sequence: 1 givenname: Debby orcidid: 0000-0002-1335-4452 surname: Ten Hove fullname: Ten Hove, Debby – sequence: 2 givenname: Terrence D. orcidid: 0000-0001-5111-6773 surname: Jorgensen fullname: Jorgensen, Terrence D. – sequence: 3 givenname: L. Andries orcidid: 0000-0003-3131-7943 surname: Van der Ark fullname: Van der Ark, L. Andries |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40524384$$D View this record in MEDLINE/PubMed |
BookMark | eNo90U1PwzAMBuAIgdgH_ARQj1w6nKRpmyMag02aBEJwjtLUlQptM5JMaP-eVBuc7MNjy_I7I-eDHZCQGwoLCiXcA7CC04IuGDCxYAKKIhNnZEoFZ2khIT8n09GkI5qQmfefAJCLTF6SSQaCZbzMpuRrbX-SYJOVD22vAyabIThtOu19srTOYadDa4fYY9O0psUh-KSxbnToXJxwyRt2ra7arg2HpHG2T147PQxYR2Nsv-swrn3UQV-Ri0Z3Hq9PdU4-nlbvy3W6fXneLB-2qeGQh7QQWJQ1qysELZvclJI2pZScGi1MzXTFJeWm1jXVFCEzVY4io6UwVOQGypzPyd1x787Z7z36oPrWG-ziVWj3XnFGJWciFzLS2xPdVz3WaufiF9xB_T0oAnEExlnvHTb_hIIag1B_QagxCHUKgv8CroB7dA |
Cites_doi | 10.1080/01621459.2021.1874961 10.1037/met0000107 10.1002/9780470594001 10.1027/1614-2241/a000149 10.1037/met0000177 10.1207/S15328007SEM1001_4 10.1177/014662102237794 10.2307/2530013 10.2466/pr0.1966.19.1.3 10.1007/s12671-020-01533-0 10.1037/1082-989X.11.4.323 10.1080/00031305.1992.10475842 10.3390/psych3020011 10.1177/0163278718759396 10.1201/b16018 10.1016/j.jpainsymman.2019.09.001 10.2307/3001853 10.1080/09669760.2022.2091981 10.1002/9781118619179 10.1016/S0169-7161(06)26004-8 10.18637/jss.v080.i01 10.1007/978-1-4757-3456-0 10.1177/0149206314554215 10.1214/ss/1177011136 10.2466/pr0.1990.66.2.379 10.1080/10705519609540045 10.18637/jss.v067.i01 10.1037/met0000391 10.1177/00131644211033899 10.1016/j.jsp.2009.10.001 10.1111/infa.12125 10.1080/09243453.2018.1539015 10.1037/0033-2909.86.2.420 10.1007/s11336-008-9099-3 10.1037/met0000516 10.17605/OSF.IO/TMD3X 10.1177/2059799118791397 10.1007/978-3-030-43469-4_7 10.1111/j.2044-8317.1963.tb00206.x 10.1037/1082-989X.1.1.30 10.1007/978-3-031-27781-8_1 10.1080/00223891.2020.1808474 10.18637/jss.v087.c01 10.3758/s13428-017-0986-3 10.1093/biomet/63.3.581 10.18637/jss.v048.i02 10.1080/00273171.2021.1891855 10.1080/19312458.2012.679848 10.1086/708661 10.1207/s15327906mbr3102_3 10.1037/0021-9010.93.5.959 10.1037/10409-011 10.1191/0962280206sm448oa |
ContentType | Journal Article |
DBID | AAYXX CITATION NPM 7X8 |
DOI | 10.1080/00273171.2025.2507745 |
DatabaseName | CrossRef PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Psychology |
EISSN | 1532-7906 |
EndPage | 20 |
ExternalDocumentID | 40524384 10_1080_00273171_2025_2507745 |
Genre | Journal Article |
GroupedDBID | --Z -~X .7I .QK 0BK 0R~ 123 4.4 5VS 8VB AAGDL AAGZJ AAHIA AAMFJ AAMIU AAPUL AATTQ AAYXX AAZMC ABCCY ABFIM ABIVO ABJNI ABLIJ ABLJU ABPEM ABPPZ ABRYG ABTAI ABXUL ABXYU ABZLS ACGFS ACHQT ACIWK ACNCT ACTIO ACTOA ADAHI ADCVX ADKVQ ADYSH AECIN AEFOU AEISY AEKEX AENEX AEOZL AEPSL AEYOC AEZRU AFHDM AFRVT AGDLA AGMYJ AGRBW AHDZW AIJEM AIYEW AJWEG AKBVH ALMA_UNASSIGNED_HOLDINGS ALQZU AMPGV AVBZW AWYRJ BEJHT BLEHA BMOTO BOHLJ CCCUG CITATION CQ1 CS3 DKSSO DU5 EBS EMOBN E~B E~C F5P FEDTE G-F GTTXZ H13 HF~ HZ~ IPNFZ J.O KYCEM LJTGL M4Z MS~ NA5 O9- P2P PQQKQ QWB RIG RNANH ROSJB RSYQP S-F STATR TBQAZ TDBHL TEH TFH TFL TFW TN5 TNTFI TRJHH TUROJ TWZ UT5 UT9 VAE WH7 YNT YQT ZL0 ~01 ~S~ NPM TASJS 7X8 |
ID | FETCH-LOGICAL-c306t-75e78d2dbe0a9f6c891f89931ca5cd2ab3913cdad1a1e04cb6e54185c156c0863 |
ISSN | 0027-3171 1532-7906 |
IngestDate | Wed Jul 02 02:41:19 EDT 2025 Mon Jul 21 06:04:29 EDT 2025 Thu Jul 03 08:20:47 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | simulation planned-missing designs interrater reliability observational research Generalizability theory incomplete data intraclass correlation coefficients |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c306t-75e78d2dbe0a9f6c891f89931ca5cd2ab3913cdad1a1e04cb6e54185c156c0863 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0001-5111-6773 0000-0003-3131-7943 0000-0002-1335-4452 |
OpenAccessLink | https://research.vu.nl/en/publications/3d463001-af23-4a72-b8f6-ecd4bc219792 |
PMID | 40524384 |
PQID | 3219325659 |
PQPubID | 23479 |
PageCount | 20 |
ParticipantIDs | proquest_miscellaneous_3219325659 pubmed_primary_40524384 crossref_primary_10_1080_00273171_2025_2507745 |
PublicationCentury | 2000 |
PublicationDate | 2025-Jun-16 |
PublicationDateYYYYMMDD | 2025-06-16 |
PublicationDate_xml | – month: 06 year: 2025 text: 2025-Jun-16 day: 16 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Multivariate behavioral research |
PublicationTitleAlternate | Multivariate Behav Res |
PublicationYear | 2025 |
References | e_1_3_2_28_1 e_1_3_2_49_1 e_1_3_2_20_1 e_1_3_2_41_1 e_1_3_2_22_1 e_1_3_2_43_1 e_1_3_2_24_1 e_1_3_2_45_1 e_1_3_2_26_1 e_1_3_2_47_1 Fox J. (e_1_3_2_13_1) 2019 e_1_3_2_62_1 e_1_3_2_60_1 Brennan R. L. (e_1_3_2_9_1) 2001 e_1_3_2_16_1 e_1_3_2_39_1 e_1_3_2_18_1 e_1_3_2_7_1 e_1_3_2_31_1 e_1_3_2_54_1 e_1_3_2_10_1 e_1_3_2_33_1 e_1_3_2_52_1 e_1_3_2_12_1 e_1_3_2_35_1 e_1_3_2_58_1 e_1_3_2_5_1 e_1_3_2_14_1 e_1_3_2_37_1 e_1_3_2_56_1 e_1_3_2_3_1 e_1_3_2_50_1 Lord F. M. (e_1_3_2_32_1) 1968 R Core Team (e_1_3_2_44_1) 2023 e_1_3_2_27_1 e_1_3_2_29_1 e_1_3_2_42_1 e_1_3_2_21_1 e_1_3_2_63_1 e_1_3_2_23_1 e_1_3_2_46_1 e_1_3_2_25_1 e_1_3_2_48_1 e_1_3_2_61_1 e_1_3_2_40_1 e_1_3_2_17_1 e_1_3_2_38_1 e_1_3_2_19_1 e_1_3_2_30_1 e_1_3_2_55_1 e_1_3_2_11_1 e_1_3_2_53_1 e_1_3_2_6_1 e_1_3_2_34_1 e_1_3_2_59_1 e_1_3_2_4_1 e_1_3_2_36_1 e_1_3_2_57_1 Bollen K. A. (e_1_3_2_8_1) 1989 e_1_3_2_51_1 Agresti A. (e_1_3_2_2_1) 2010 Gelman A. (e_1_3_2_15_1) 2013 |
References_xml | – ident: e_1_3_2_37_1 doi: 10.1080/01621459.2021.1874961 – ident: e_1_3_2_30_1 – ident: e_1_3_2_56_1 doi: 10.1037/met0000107 – volume-title: Analysis of ordinal categorical data year: 2010 ident: e_1_3_2_2_1 doi: 10.1002/9780470594001 – ident: e_1_3_2_14_1 – ident: e_1_3_2_25_1 doi: 10.1027/1614-2241/a000149 – ident: e_1_3_2_57_1 doi: 10.1037/met0000177 – ident: e_1_3_2_17_1 doi: 10.1207/S15328007SEM1001_4 – ident: e_1_3_2_7_1 doi: 10.1177/014662102237794 – ident: e_1_3_2_22_1 doi: 10.2307/2530013 – ident: e_1_3_2_5_1 doi: 10.2466/pr0.1966.19.1.3 – ident: e_1_3_2_40_1 doi: 10.1007/s12671-020-01533-0 – ident: e_1_3_2_20_1 doi: 10.1037/1082-989X.11.4.323 – ident: e_1_3_2_38_1 doi: 10.1080/00031305.1992.10475842 – ident: e_1_3_2_29_1 doi: 10.3390/psych3020011 – ident: e_1_3_2_11_1 doi: 10.1177/0163278718759396 – volume-title: Bayesian data analysis year: 2013 ident: e_1_3_2_15_1 doi: 10.1201/b16018 – ident: e_1_3_2_62_1 doi: 10.1016/j.jpainsymman.2019.09.001 – volume-title: An R companion to applied regression year: 2019 ident: e_1_3_2_13_1 – volume-title: Statistical theories of mental test scores year: 1968 ident: e_1_3_2_32_1 – ident: e_1_3_2_23_1 doi: 10.2307/3001853 – ident: e_1_3_2_61_1 doi: 10.1080/09669760.2022.2091981 – volume-title: Structural equations with latent variables year: 1989 ident: e_1_3_2_8_1 doi: 10.1002/9781118619179 – ident: e_1_3_2_60_1 doi: 10.1016/S0169-7161(06)26004-8 – ident: e_1_3_2_10_1 doi: 10.18637/jss.v080.i01 – volume-title: Generalizability theory year: 2001 ident: e_1_3_2_9_1 doi: 10.1007/978-1-4757-3456-0 – ident: e_1_3_2_31_1 doi: 10.1177/0149206314554215 – ident: e_1_3_2_48_1 – ident: e_1_3_2_3_1 – ident: e_1_3_2_16_1 doi: 10.1214/ss/1177011136 – ident: e_1_3_2_34_1 doi: 10.2466/pr0.1990.66.2.379 – ident: e_1_3_2_35_1 doi: 10.1080/10705519609540045 – ident: e_1_3_2_6_1 doi: 10.18637/jss.v067.i01 – ident: e_1_3_2_52_1 doi: 10.1037/met0000391 – ident: e_1_3_2_26_1 doi: 10.1177/00131644211033899 – ident: e_1_3_2_4_1 doi: 10.1016/j.jsp.2009.10.001 – ident: e_1_3_2_33_1 doi: 10.1111/infa.12125 – ident: e_1_3_2_55_1 doi: 10.1080/09243453.2018.1539015 – volume-title: R: A language and environment for statistical computing. year: 2023 ident: e_1_3_2_44_1 – ident: e_1_3_2_47_1 doi: 10.1037/0033-2909.86.2.420 – ident: e_1_3_2_21_1 doi: 10.1007/s11336-008-9099-3 – ident: e_1_3_2_53_1 doi: 10.1037/met0000516 – ident: e_1_3_2_51_1 doi: 10.17605/OSF.IO/TMD3X – ident: e_1_3_2_28_1 doi: 10.1177/2059799118791397 – ident: e_1_3_2_50_1 doi: 10.1007/978-3-030-43469-4_7 – ident: e_1_3_2_12_1 doi: 10.1111/j.2044-8317.1963.tb00206.x – ident: e_1_3_2_24_1 – ident: e_1_3_2_36_1 doi: 10.1037/1082-989X.1.1.30 – ident: e_1_3_2_54_1 doi: 10.1007/978-3-031-27781-8_1 – ident: e_1_3_2_58_1 doi: 10.1080/00223891.2020.1808474 – ident: e_1_3_2_59_1 doi: 10.18637/jss.v087.c01 – ident: e_1_3_2_27_1 doi: 10.3758/s13428-017-0986-3 – ident: e_1_3_2_46_1 doi: 10.1093/biomet/63.3.581 – ident: e_1_3_2_45_1 doi: 10.18637/jss.v048.i02 – ident: e_1_3_2_39_1 doi: 10.1080/00273171.2021.1891855 – ident: e_1_3_2_42_1 doi: 10.1080/19312458.2012.679848 – ident: e_1_3_2_63_1 doi: 10.1086/708661 – ident: e_1_3_2_49_1 – ident: e_1_3_2_18_1 doi: 10.1207/s15327906mbr3102_3 – ident: e_1_3_2_43_1 doi: 10.1037/0021-9010.93.5.959 – ident: e_1_3_2_19_1 doi: 10.1037/10409-011 – ident: e_1_3_2_41_1 doi: 10.1191/0962280206sm448oa |
SSID | ssj0006549 |
Score | 2.409329 |
SecondaryResourceType | online_first |
Snippet | The interrater reliability (IRR) of observational data is often estimated by means of intraclass correlation coefficients (ICCs), which are flexible IRR... |
SourceID | proquest pubmed crossref |
SourceType | Aggregation Database Index Database |
StartPage | 1 |
Title | How to Estimate Intraclass Correlation Coefficients for Interrater Reliability from Planned Incomplete Data |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40524384 https://www.proquest.com/docview/3219325659 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Na9wwEBXtBspeStOvbNsEFXpbvFi2ZK-PS5uylLSnTRt6MZIsX0rt4DqB5NdnRh-2WzbQ9mKMvMigeZLnad88EfLOyKxiuk6jmhdxxGtuIqU4j9ZcCJkymeXW1Ofzl2x7zj9diItRP2-rS3q10rd760r-J6rQBnHFKtl_iOzQKTTAPcQXrhBhuP5VjO15cO3yFKYpJJ5WBdlJjfkwzPOu8zo3uDfWKMIWs6Gs0G4DokdEZyXJzqr7xpWa2FOMDGqErdoccmpARi-nSawt2r0Gko3vnNT5e-OgYYN5Z5rltr02fl1TE7kOlnz6vZ-d6ZzT7aA9_ipRodstN07GfWZVl0Hs6HcoEoFKKub9rcOqCml8EU9XSrZ3_Q6CR0iqWI70PRErSNIgRxXT30MYLn_aoEK-mfDUnTL3h3F2ePSQHCTAIZIZOdhsP3z_NnyoM-DGoagL7db3vXVOHoV-fs9c7qEjNi3ZPSGPPZ-gGweOQ_LANE_JfPis3TwjPwAltG9pQAkdUUInKKFTlFBACR1RQicooYgS6lFCR5RQRMlzcv7xdPd-G_kjNiINXLGPcmHydZVUysSyqDO9LlgNDDxlWgpdJVKlBUt1JSsmmYm5VpkRaHekgfZrYMPpCzJr2sYcEapNVeVo_q9gxgslpWDrXALfEHEFTdmCrMLwlZfOSaVkg0GtG_oSh770Q78gb8Mgl7Dm4R9ZsjHt1a8yTZB2ABUpFuSlG_2hyxCtV_c-eU3mI0rfkFnfXZljyCx7deIRcgfH_3if |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+to+Estimate+Intraclass+Correlation+Coefficients+for+Interrater+Reliability+from+Planned+Incomplete+Data&rft.jtitle=Multivariate+behavioral+research&rft.au=Ten+Hove%2C+Debby&rft.au=Jorgensen%2C+Terrence+D&rft.au=Van+der+Ark%2C+L+Andries&rft.date=2025-06-16&rft.eissn=1532-7906&rft.spage=1&rft_id=info:doi/10.1080%2F00273171.2025.2507745&rft_id=info%3Apmid%2F40524384&rft.externalDocID=40524384 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0027-3171&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0027-3171&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0027-3171&client=summon |