Can you crowdsource expertise? Comparing expert and crowd‐based scoring keys for three situational judgment tests

It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of selection and assessment Vol. 29; no. 3-4; pp. 467 - 482
Main Authors Brown, Matt I., Grossenbacher, Michael A., Martin‐Raugh, Michelle P., Kochert, Jonathan, Prewett, Matthew S.
Format Journal Article
LanguageEnglish
Published Oxford Blackwell Publishing Ltd 01.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable costs. Other research fields have adopted crowdsourcing methods to replace or reproduce judgments thought to require subject matter expertise. Therefore, we conducted the current study to compare crowdsourced scoring keys to SME‐based scoring keys for three SJTs designed for three different job domains: Medicine, Communication, and Military. Our results indicate that scoring keys derived from crowdsourced samples are likely to converge with keys based on SME judgment, regardless of test content (r = .88 to .94 between keys). We observed the weakest agreement among individual MTurk and SME ratings for the Medical SJT (classification consistency = 61%) relative to the Military and Communication SJTs (80% and 85%). Although general mental ability and conscientiousness were each related to greater expert similarity among MTurk raters, the average crowd rating outperformed nearly all individual MTurk raters. Using randomly‐drawn bootstrapped samples of MTurk ratings in each of the three samples, we found that as few as 30–40 raters may provide adequate estimates of SME judgments of most SJT items. These findings suggest the potential usefulness of crowdsourcing as an alternative or supplement to SME‐generated scoring keys. Practitioner points We compared expert (SME) and novice (MTurk) ratings of SJT items created for different job contexts. MTurk raters were most accurate at identifying the best and worse response options across three SJTs. Convergence between SMEs and MTurk raters was weakest for the most job‐specific SJT. Crowdsourcing appears to be a useful alterative or supplement to subject matter expertise.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0965-075X
1468-2389
DOI:10.1111/ijsa.12353