The Scaling of Mixed-Item-Format Tests With the One-Parameter and Two-Parameter Partial Credit Models

Item response theory scalings were conducted for six tests with mixed item formats. These tests differed in their proportions of constructed response (c.r.) and multiple choice (m.c.) items and in overall difficulty. The scalings included those based on scores for the c.r. items that had maintained...

Full description

Saved in:

Bibliographic Details
Published in	Journal of educational measurement Vol. 37; no. 3; pp. 221 - 244
Main Authors	Sykes, Robert C., Yen, Wendy M.
Format	Journal Article
Language	English
Published	Oxford, UK Blackwell Publishing Ltd 01.09.2000 National Council on Measurement in Education
Subjects	Achievement Tests Credit Discrimination Elementary Secondary Education Information economics Item response theory Logistics Mathematical functions Maximum likelihood estimation Modeling One Parameter Model Parametric models Partial Credit Model Prediction Rasch Model Scaling Standard error Standard errors State Programs Statistics Test Format Test scores Testing Programs Two Parameter Model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Item response theory scalings were conducted for six tests with mixed item formats. These tests differed in their proportions of constructed response (c.r.) and multiple choice (m.c.) items and in overall difficulty. The scalings included those based on scores for the c.r. items that had maintained the number of levels as the item rubrics, either produced from single ratings or multiple ratings that were averaged and rounded to the nearest integer, as well as scalings for a single form of c.r. items obtained by summing multiple ratings. A one-parameter (1PPC) or two-parameter (2PPC) partial credit model was used for the c.r. items and the one-parameter logistic (1PL) or three-parameter logistic (3PL) model for the m.c. items. Item fit was substantially worse with the combination 1PL/1PPC model than the 3PL/2PPC model due to the former's restrictive assumptions that there would be no guessing on the m.c. items and equal item discrimination across items and item types. The presence of varying item discriminations resulted in the 1PL/1PPC model producing estimates of item information that could be spuriously inflated for c.r. items that had three or more score levels. Information for some items with summed ratings were usually overestimated by 300% or more for the 1PL/1PPC model. These inflated information values resulted in under-estimated standard errors of ability estimates. The constraints posed by the restricted model suggests limitations on the testing contexts in which the 1PL/1PPC model can be accurately applied.
Bibliography:	ark:/67375/WNG-JF4HLPVT-2 istex:4EC64274C441D663AC39F0E2EDAE4BAA7A5356C7 ArticleID:JEDM221 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0022-0655 1745-3984
DOI:	10.1111/j.1745-3984.2000.tb01084.x