Similarity of the cut score in test sets with different item amounts using the modified Angoff, modified Ebel, and Hofstee standard-setting methods for the Korean Medical Licensing Examination

Purpose: The Korea Medical Licensing Exam (KMLE) typically contains a large number of items. The purpose of this study was to investigate whether there is a difference in the cut score between evaluating all items of the exam and evaluating only some items when conducting standard-setting.Methods: W...

Full description

Saved in:

Bibliographic Details
Published in	Journal of educational evaluation for health professions Vol. 17; pp. 28 - 10
Main Authors	Park, Janghee, Yim, Mi Kyoung, Kim, Na Jin, Ahn, Duck Sun, Kim, Young-Min
Format	Journal Article
Language	English
Published	Korea (South) Korea Health Personnel Licensing Examination Institute 2020 한국보건의료인국가시험원
Subjects	educational measurement medical education medical licensure reproducibility of results republic of korea 교육학 Medical licensing examination Standard setting Ebel Modified Angoff Hofstee
Online Access	Get full text
ISSN	1975-5937 1975-5937
DOI	10.3352/jeehp.2020.17.28

Cover

More Information
Summary:	Purpose: The Korea Medical Licensing Exam (KMLE) typically contains a large number of items. The purpose of this study was to investigate whether there is a difference in the cut score between evaluating all items of the exam and evaluating only some items when conducting standard-setting.Methods: We divided the item sets that appeared on 3 recent KMLEs for the past 3 years into 4 subsets of each year of 25% each based on their item content categories, discrimination index, and difficulty index. The entire panel of 15 members assessed all the items (360 items, 100%) of the year 2017. In split-half set 1, each item set contained 184 (51%) items of year 2018 and each set from split-half set 2 contained 182 (51%) items of the year 2019 using the same method. We used the modified Angoff, modified Ebel, and Hofstee methods in the standard-setting process.Results: Less than a 1% cut score difference was observed when the same method was used to stratify item subsets containing 25%, 51%, or 100% of the entire set. When rating fewer items, higher rater reliability was observed.Conclusion: When the entire item set was divided into equivalent subsets, assessing the exam using a portion of the item set (90 out of 360 items) yielded similar cut scores to those derived using the entire item set. There was a higher correlation between panelists’ individual assessments and the overall assessments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1975-5937 1975-5937
DOI:	10.3352/jeehp.2020.17.28