Although the chapters highlight health outcomes data the issues addressed are relevant to any content domain. Category: Author : Wim J. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. It is organized into six major sections: the nominal categories model, models for response time or multiple attempts on items, models for multiple abilities or cognitive components, nonparametric models, models for nonmonotone items, and models with special assumptions. Interim tests are a central component of district- wide assessment systems, yet their technical quality to guide decisions e. For this purpose, the performance of four item selection methods with and without exposure controls are evaluated and compared so as to determine how results differ when item exposure controlling strategies are applied with Monte-Carlo simulation method. The results showed that all the new methods were able to recover the item parameters accurately, and the adaptive online calibration designs showed some improvements compared to the random design under most conditions.
A working knowledge of unidimensional item response theory and matrix algebra is assumed. Appropriateness measurement for some multidimensional test batteries. Applied Psychological Measurement, 15, 171- 191. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. Category: Educational tests and measurements Author : Wim J. KeywordsKullback—Leibler information—Fisher information—mutual information—multidimensional computerized adaptive test—continuous entropy Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing.
Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. Both methods display substantial increases in precision over alternative item selection and scoring procedures. In addition to the simulation study, the mathematical theories for certain procedures are derived. An application of the continuous response level model to personality measurement. A model for testing with multidimensional items.
The theories are confirmed by the simulation applications. Journal of Educational Measurement, 25, 193- 204. Assessments consisting of different domains e. However, simply averaging the domain scores ignores the fact that different domains have different score points, that scores from those domains are related, and that at different score points the relationship between overall score and domain score may be different. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution. Recently, there has been increasing interest in reporting subscores.
Multidimensional Item Response Theory by Reckase, M. This is Part V of a series of reports on rationales and techniques of matrix factoring which play an important role in multivariate analysis techniques. View all references , with items reporting moderate to high relationships to the primary dimension i. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. It will serve as both an introduction to the subject and also as a comprehensive reference volume for practitioners and researchers. It was found that to report accurate diagnostic information at the subscale level, the subscales need to be highly correlated, or a multidimensional approach should be implemented. Contributions to factor analysis of dichotomous variables.
The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Applied Measurement in Education, 5, 193- 211. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. Three simulation studies were conducted to compare the three new methods by manipulating three factors test length, item bank design, and level of correlation between coordinate dimensions. Part 1 reviews fundamental topics such as assumption testing, parameter estimation, and the assessment of model and person fit.
This research explores a multidimensional compensatory dichotomous and polytomous item response theory modeling approach for subscale score proficiency estimation, leading toward a more diagnostic solution. If some of the abilities are nuisances, application of the criterion of A s -optimality or D s -optimality , which focuses on the subset of intentional abilities is recommended. Category: Social Science Author : Susan E. Paper presented at the European meeting of the Psychometric Society, Leiden, The Netherlands. Correlations between course grades and test scores, a measure of validity, were similar for all methods, though again slightly lower for the unidimensional scores. Subsequent sections present the model more formally, treat the estimation of its parameters, show how to evaluate its fit to empirical data, illustrate the use of the model through an empirical example, and discuss further applications and remaining research issues.
Using a Markov chain Monte Carlo method in a hierarchical Bayesian framework, the overall and domain-specific abilities, and their correlations, are estimated simultaneously. This article presents a higher-order item response theory framework where an overall and multiple domain abilities are specified in the same model. Two new methods for improving the measurement precision of a general test factor are proposed and evaluated. How the bifactor pattern and test length affect estimation accuracy was also discussed. In all, this book is an inspiring book in measurement and, more importantly, there is no other comparable text on the topic. A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Factor analysis of data matrices.
Domain scores and overall scores have high reliabilities when the correlations between domains are high; reliability is higher than. Statistical theories of mental test scores. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters. Standard errors and a test of the fit of the model is given. A full chapter is devoted to methods for multidimensional computerized adaptive testing. Normal ogive model on the continuous response level in the multidimensional latent space. However, when the domains are disparate, assuming a single underlying ability across the domains is not tenable.