Is the use of multiple-choice items and a holistically-scored paragraph translation task fair? Examining a large-scale translation subtest

This study investigated the test fairness of the translation section of a large-scale English test in China by examining its Differential Test Functioning (DTF) and Differential Item Functioning (DIF) across gender and major. Regarding DTF, the entire translation section exhibits partial strong meas...

Full description

Saved in:

Bibliographic Details
Published in	Asia Pacific education review Vol. 26; no. 2; pp. 493 - 502
Main Authors	Yang, Zhiqiang, Yu, Chengyuan
Format	Journal Article
Language	English
Published	Dordrecht Springer Nature B.V 01.06.2025 교육연구소
Subjects	Access to Education Achievement tests Association (Psychology) Data Analysis Education Educational Equity (Finance) Educational Research Equal Education Error analysis Error Analysis (Language) Females Gender Intellectual Disciplines Interrater Reliability Language Skills Language tests Language Usage Listening Listening Comprehension Tests Literature Reviews Majors (Students) Measurement Techniques Multiple choice Reading comprehension Reading Tests Reliability Scoring Skills Statistical analysis Statistical methods Students Test Construction Test Items Translation Validity Writing 교육학 China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study investigated the test fairness of the translation section of a large-scale English test in China by examining its Differential Test Functioning (DTF) and Differential Item Functioning (DIF) across gender and major. Regarding DTF, the entire translation section exhibits partial strong measurement invariance across female and male test takers, while exhibiting full measurement invariance across test takers in (1) arts & humanities and social sciences (A&HSS) and (2) science, technology, engineering or mathematics (STEM) majors. No major-based DIF was detected in this study. Objective test items tend to favor male test takers, while direct translation test task was more favorable to females. Combining the DIF and DTF results, there may be a cancelation effect in our case. However, the effect size of DIF is either negligible or slight to moderate, indicating minimal impact on the overall fairness of the translation test task. This study further discusses the necessity of exploring the source of DIF and the importance of combining DIF and DTF for test fairness research.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1598-1037 1876-407X
DOI:	10.1007/s12564-024-09993-y