|
A Bootstrap Approach to Rating Scale Optimization
Eric Van Lente, University of Chicago
George Karabatsos, University
of Illinois at Chicago
Kazuaki Uekawa, American Institutes for Research
Submitted to a book project, Access to the Foundations
of Measurement: Professional Identity in the Career of Benjamin D. Wright.
Whenever rating scales are administered
to test respondents, for several possible reasons, the respondents may not use the rating categories as intended by the test
constructor. This can lead to a psychometrically disordered rating scale, which in turn, causes inconsistencies in the measurement
of respondents. Many different methods have been proposed to diagnose rating scale inconsistencies, which in turn help the
analyst decide the “optimal rating scale”, i.e., the particular recategorization of the rating scale that eliminates
inconsistencies. However, there is no clear consensus as to which method is best, and furthermore, it could be argued that
all of the available methods are sample dependent because none of the algorithms conventionally used attempt to maximize model
generalizabilty. This study introduces a sample-free method of rating scale optimization, based on the bootstrap, which addresses
the issues just mentioned. The bootstrap is a general statistical procedure that simulates the population distribution by
resampling from the original (sample) data set with replacement. A given Rasch rating scale model, employing a particular
rating categorization, is used to analyze the resampled data sets. The fit of that model averaged over these data sets indicates
its generalization error, defined as the model’s ability to predict data over the population of respondents. With the
same resampled data sets, generalization error can be computed for each of a number of different Rasch models employing different
rating categorizations. In this way the optimal rating scale is identified as the one that minimizes generalization error.
The bootstrap method with be demonstrated with two real data sets arising from rating scale questionnaires. Following these
illustrations we conclude by discussing other situations in which the bootstrap method may be useful. For example, the method
easily handles the task of rating scale optimization of visual analog scales. Furthermore, for any given data set, the method
naturally handles the task of selecting among the many Rasch models for polytomous response data.
|