An investigation of the consequences for students of using different procedures to equate tests as fit to the Rasch model degenerates

Download files
Access & Terms of Use
open access
Copyright: Sadeghi, Rassoul
Altmetric
Abstract
Many large-scale national and international testing programs use the Rasch model to govern the construction of measurement scales that can be used to monitor standards of performance and monitor performance over time. A significant issue that arises in such programs is that once a decision has been made to use the model, it is not possible to reverse the decision if the data do not fit the model. There are two levels of question that result from such a situation. One of them involves the issue of misfit to the model. That is, how robust is the model to violations of fit of the data to the model? A second question emerges from the premise that the issue of fit to the model is a relative matter. That is, ultimately, it becomes the decision of users as to whether data fit the model well enough to suit the purpose of the users. Once this decision has been made, such as in the case of large-scale testing programs like the ones refocused to above, then the question reverts to one in which the focus is on the applications of the Rasch model. More specifically, in the case of this study, the intention is to examine the consequences of variability of fit to the Rasch model on the measures of student performance obtained from two different equating procedures. Two related simulation studies have been conducted to compare the results obtained from using two different equating procedures (namely separate and concurrent equating) with the Rasch Simple Logistic model, as data-model fit gets progressively worse. The results indicate that when data-model fit ranges from good fit to average fit (MNSQ ≤ 1.60), there is little or no difference between the results obtained from the different equating procedures. However, when data-model fit ranges from relatively poor fit to poor fit (MNSQ > 1.60), the results from using different equating procedures prove less comparable. When the results of these two simulation studies are translated to a situation in Australia, for example, where different states use different equating procedures to generate a single comparable score and then these scores are used to compare performances amongst students and to predetermined standards or benchmarks, it raises significant equity issues. In essence, it means that in the latter situation, some students are deemed to be either above or below the standards purely as a consequence of the equating procedure selected. For example, students could be deemed to be above a benchmark if separate equating was used to produce the scale; yet these same students could be deemed to fall below the benchmark if concurrent equating is used. The actual consequences of this decision will vary from situation to situation. For example, if the same equating procedure was used each year to equate the data to form a single scale, then it could be argued that it does not matter if the results vary from occasion to occasion because it is consistent for the cohort of students from year to year. However, if other states or countries, for example, use a different equating procedure and the results are compared, then there is an equity problem. The extent of the problem is dependent upon the robustness of the model to varying degrees of misfit.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Sadeghi, Rassoul
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2006
Resource Type
Thesis
Degree Type
PhD Doctorate
Files
download whole.pdf 1.57 MB Adobe Portable Document Format
Related dataset(s)