Can you crowdsource expertise? Comparing expert and crowd‐based scoring keys for three situational judgment tests
It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable...
Saved in:
Published in | International journal of selection and assessment Vol. 29; no. 3-4; pp. 467 - 482 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Blackwell Publishing Ltd
01.12.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It is common practice to rely on a convenience sample of subject matter experts (SMEs) when developing scoring keys for situational judgment tests (SJTs). However, the defining characteristics of what constitutes a SME are often ambiguous and inconsistent. Sampling SMEs can also impose considerable costs. Other research fields have adopted crowdsourcing methods to replace or reproduce judgments thought to require subject matter expertise. Therefore, we conducted the current study to compare crowdsourced scoring keys to SME‐based scoring keys for three SJTs designed for three different job domains: Medicine, Communication, and Military. Our results indicate that scoring keys derived from crowdsourced samples are likely to converge with keys based on SME judgment, regardless of test content (r = .88 to .94 between keys). We observed the weakest agreement among individual MTurk and SME ratings for the Medical SJT (classification consistency = 61%) relative to the Military and Communication SJTs (80% and 85%). Although general mental ability and conscientiousness were each related to greater expert similarity among MTurk raters, the average crowd rating outperformed nearly all individual MTurk raters. Using randomly‐drawn bootstrapped samples of MTurk ratings in each of the three samples, we found that as few as 30–40 raters may provide adequate estimates of SME judgments of most SJT items. These findings suggest the potential usefulness of crowdsourcing as an alternative or supplement to SME‐generated scoring keys.
Practitioner points
We compared expert (SME) and novice (MTurk) ratings of SJT items created for different job contexts.
MTurk raters were most accurate at identifying the best and worse response options across three SJTs.
Convergence between SMEs and MTurk raters was weakest for the most job‐specific SJT.
Crowdsourcing appears to be a useful alterative or supplement to subject matter expertise. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0965-075X 1468-2389 |
DOI: | 10.1111/ijsa.12353 |