Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training

Assessing the risk of bias (ROB) of studies is an important part of the conduct of systematic reviews and meta-analyses in clinical medicine. Among the many existing ROB tools, the Prediction Model Risk of Bias Assessment Tool (PROBAST) is a rather new instrument specifically designed to assess the...

Full description

Saved in:

Bibliographic Details
Published in	Journal of clinical medicine Vol. 12; no. 5; p. 1976
Main Authors	Kaiser, Isabelle, Pfahlberg, Annette B, Mathes, Sonja, Uter, Wolfgang, Diehl, Katharina, Steeb, Theresa, Heppt, Markus V, Gefeller, Olaf
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 01.03.2023 MDPI
Subjects	Bias Cancer Clinical medicine Decision making inter-rater agreement inter-rater reliability Melanoma Oncology, Experimental prediction PROBAST Ratings & rankings Risk factors risk of bias Set (Psychology) Systematic review Validity Germany prediction risk of bias melanoma PROBAST inter-rater agreement inter-rater reliability
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Assessing the risk of bias (ROB) of studies is an important part of the conduct of systematic reviews and meta-analyses in clinical medicine. Among the many existing ROB tools, the Prediction Model Risk of Bias Assessment Tool (PROBAST) is a rather new instrument specifically designed to assess the ROB of prediction studies. In our study we analyzed the inter-rater reliability (IRR) of PROBAST and the effect of specialized training on the IRR. Six raters independently assessed the risk of bias (ROB) of all melanoma risk prediction studies published until 2021 (n = 42) using the PROBAST instrument. The raters evaluated the ROB of the first 20 studies without any guidance other than the published PROBAST literature. The remaining 22 studies were assessed after receiving customized training and guidance. Gwet's AC was used as the primary measure to quantify the pairwise and multi-rater IRR. Depending on the PROBAST domain, results before training showed a slight to moderate IRR (multi-rater AC ranging from 0.071 to 0.535). After training, the multi-rater AC ranged from 0.294 to 0.780 with a significant improvement for the overall ROB rating and two of the four domains. The largest net gain was achieved in the overall ROB rating (difference in multi-rater AC : 0.405, 95%-CI 0.149-0.630). In conclusion, without targeted guidance, the IRR of PROBAST is low, questioning its use as an appropriate ROB instrument for prediction studies. Intensive training and guidance manuals with context-specific decision rules are needed to correctly apply and interpret the PROBAST instrument and to ensure consistency of ROB ratings.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2077-0383 2077-0383
DOI:	10.3390/jcm12051976