%0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? %A Ming Him Tai %A Allison W. Cooperman %A Joseph N. DeWeese %A David J. Weiss %K adaptive measurement of change %K computerized adaptive testing %K longitudinal measurement %K trait change patterns %B Journal of Computerized Adaptive Testing %V 10 %P 32-58 %G English %N 3 %R 10.7333/2307-1003032 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T A Dynamic Stratification Method for Improving Trait Estimation in Computerized Adaptive Testing Under Item Exposure Control %A Jyun-Hong Chen %A Hsiu-Yi Chao %A Shu-Ying Chen %X When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. To achieve this objective, the SDC uses dominance curves to stratify an item pool into strata with the number being equal to the test length to precisely and accurately increase the quality of the administered items as the test progresses, reducing the likelihood that a high-discrimination item will be administered to an examinee whose ability is not close to the item difficulty. Furthermore, the SDC incorporates a dynamic process for on-the-fly item–stratum adjustment to optimize the use of quality items. Simulation studies were conducted to investigate the performance of the SDC in CAT under item exposure control at different levels of severity. According to the results, the SDC can efficiently improve trait estimation in CAT through greater precision and more accurate trait estimation than those generated by other methods (e.g., the maximum Fisher information method) in most conditions. %B Applied Psychological Measurement %V 44 %P 182-196 %U https://doi.org/10.1177/0146621619843820 %R 10.1177/0146621619843820 %0 Journal Article %J Journal of Educational Measurement %D 2020 %T Item Calibration Methods With Multiple Subscale Multistage Testing %A Wang, Chun %A Chen, Ping %A Jiang, Shengyu %K EM %K marginal maximum likelihood %K missing data %K multistage testing %X Abstract Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence. %B Journal of Educational Measurement %V 57 %P 3-28 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12241 %R 10.1111/jedm.12241 %0 Journal Article %J Journal of Educational Measurement %D 2020 %T Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items %A Chen, Chia-Wen %A Wang, Wen-Chung %A Chiu, Ming Ming %A Ro, Sage %X Abstract The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. %B Journal of Educational Measurement %V 57 %P 343-369 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12252 %R 10.1111/jedm.12252 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T New Efficient and Practicable Adaptive Designs for Calibrating Items Online %A Yinhong He %A Ping Chen %A Yong Li %X When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Thus, D-VR design still has room for improvement in calibration efficiency from the view of traditional optimal design. To this end, this article incorporates the idea of traditional optimal design into D-VR design and proposes a new online calibration design criterion, namely, excellence degree (ED) criterion. Four different schemes are developed to measure the information provided by the current examinee when implementing this new criterion, and four new ED designs equipped with them are put forward accordingly. Simulation studies were conducted under a variety of conditions to compare the D-VR design and the four proposed ED designs in terms of calibration efficiency. Results showed that the four ED designs outperformed D-VR design in almost all simulation conditions. %B Applied Psychological Measurement %V 44 %P 3-16 %U https://doi.org/10.1177/0146621618824854 %R 10.1177/0146621618824854 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Hua-Hua Chang %A Jian Tao %A Ningzhong Shi %X Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. %B Applied Psychological Measurement %V 44 %P 346-361 %U https://doi.org/10.1177/0146621619893783 %R 10.1177/0146621619893783 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation %A Albano, Anthony D. %A Cai, Liuhan %A Lease, Erin M. %A McConnell, Scott R. %X Abstract Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20-item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. %B Journal of Educational Measurement %V 56 %P 437-451 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12215 %R 10.1111/jedm.12215 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Imputation Methods to Deal With Missing Responses in Computerized Adaptive Multistage Testing %A Dee Duygu Cetin-Berber %A Halil Ibrahim Sari %A Anne Corinne Huggins-Manley %X Routing examinees to modules based on their ability level is a very important aspect in computerized adaptive multistage testing. However, the presence of missing responses may complicate estimation of examinee ability, which may result in misrouting of individuals. Therefore, missing responses should be handled carefully. This study investigated multiple missing data methods in computerized adaptive multistage testing, including two imputation techniques, the use of full information maximum likelihood and the use of scoring missing data as incorrect. These methods were examined under the missing completely at random, missing at random, and missing not at random frameworks, as well as other testing conditions. Comparisons were made to baseline conditions where no missing data were present. The results showed that imputation and the full information maximum likelihood methods outperformed incorrect scoring methods in terms of average bias, average root mean square error, and correlation between estimated and true thetas. %B Educational and Psychological Measurement %V 79 %P 495-511 %U https://doi.org/10.1177/0013164418805532 %R 10.1177/0013164418805532 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing %A Chuan-Ju Lin %A Hua-Hua Chang %X For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. %B Educational and Psychological Measurement %V 79 %P 335-357 %U https://doi.org/10.1177/0013164418790634 %R 10.1177/0013164418790634 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T Nonparametric CAT for CD in Educational Settings With Small Samples %A Yuan-Pei Chang %A Chia-Yi Chiu %A Rung-Ching Tsai %X Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. %B Applied Psychological Measurement %V 43 %P 543-561 %U https://doi.org/10.1177/0146621618813113 %R 10.1177/0146621618813113 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt %A Cui, Zhongmin %A Liu, Chunyan %A He, Yong %A Chen, Hanwei %X Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. %B Journal of Educational Measurement %V 55 %P 582-594 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 %R 10.1111/jedm.12193 %0 Journal Article %J Educational and Psychological Measurement %D 2018 %T Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing %A Chuan-Ju Lin %A Hua-Hua Chang %X For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. %B Educational and Psychological Measurement %P 0013164418790634 %U https://doi.org/10.1177/0013164418790634 %R 10.1177/0013164418790634 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items %A Dongbo Tu %A Yuting Han %A Yan Cai %A Xuliang Gao %X Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. %B Applied Psychological Measurement %V 42 %P 677-694 %U https://doi.org/10.1177/0146621618762748 %R 10.1177/0146621618762748 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Latent Class Analysis of Recurrent Events in Problem-Solving Items %A Haochen Xu %A Guanhua Fang %A Yunxiao Chen %A Jingchen Liu %A Zhiliang Ying %X Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. As opposed to cross-sectional response data from traditional tests, process data in log files are massive and irregularly structured, calling for effective exploratory data analysis methods. Motivated by a specific complex problem-solving item “Climate Control” in the 2012 Programme for International Student Assessment, the authors propose a latent class analysis approach to analyzing the events occurred in the problem-solving processes. The exploratory latent class analysis yields meaningful latent classes. Simulation studies are conducted to evaluate the proposed approach. %B Applied Psychological Measurement %V 42 %P 478-498 %U https://doi.org/10.1177/0146621617748325 %R 10.1177/0146621617748325 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis %A Liu, Shuchang %A Cai, Yan %A Tu, Dongbo %X Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. %B Journal of Educational Measurement %V 55 %P 595-613 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 %R 10.1111/jedm.12194 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Bayesian Perspectives on Adaptive Testing %A Wim J. van der Linden %A Bingnan Jiang %A Hao Ren %A Seung W. Choi %A Qi Diao %K Bayesian Perspective %K CAT %X

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach %A Yuan-Pei Chang %A Chia-Yi Chiu %A Rung-Ching Tsai %K CD-CAT %K non-parametric approach %X

In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.

Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.

Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.

Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.

Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.

Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Considerations in Performance Evaluations of Computerized Formative Assessments %A Michael Chajewski %A John Harnisher %K algebra %K Formative Assessment %K Performance Evaluations %X

Computerized adaptive instruments have been widely established and used in the context of summative assessments for purposes including licensure, admissions and proficiency testing. The benefits of examinee tailored examinations, which can provide estimates of performance that are more reliable and valid, have in recent years attracted a greater audience (i.e. patient oriented outcomes, test prep, etc.). Formative assessment, which are most widely understood in their implementation as diagnostic tools, have recently started to expand to lesser known areas of computerized testing such as in implementations of instructional designs aiming to maximize examinee learning through targeted practice.

Using a CAT instrument within the framework of evaluating repetitious examinee performances (in such settings as a Quiz Bank practices for example) poses unique challenges not germane to summative assessments. The scale on which item parameters (and subsequently examinee performance estimates such as Maximum Likelihood Estimates) are determined usually do not take change over time under consideration. While vertical scaling features resolve the learning acquisition problem, most content practice engines do not make use of explicit practice windows which could be vertically aligned. Alternatively, the Multidimensional (MIRT)- and Hierarchical Item Response Theory (HIRT) models allow for the specification of random effects associated with change over time in examinees’ skills, but are often complex and require content and usage resources not often observed.

The research submitted for consideration simulated examinees’ repeated variable length Quiz Bank practice in algebra using a 500 1-PL operational item pool. The stability simulations sought to determine with which rolling item interval size ability estimates would provide the most informative insight into the examinees’ learning progression over time. Estimates were evaluated in terms of reduction in estimate uncertainty, bias and RMSD with the true and total item based ability estimates. It was found that rolling item intervals between 20-25 items provided the best reduction of uncertainty around the estimate without compromising the ability to provide informed performance estimates to students. However, while asymptotically intervals of 20-25 items tended to provide adequate estimates of performance, changes over shorter periods of time assessed with shorter quizzes could not be detected as those changes would be suppressed in lieu of the performance based on the full interval considered. Implications for infrastructure (such as recommendation engines, etc.), product and scale development are discussed.

Session video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2017 %T The Development of MST Test Information for the Prediction of Test Performances %A Ryoungsun Park %A Jiseon Kim %A Hyewon Chung %A Barbara G. Dodd %X The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. %B Educational and Psychological Measurement %V 77 %P 570-586 %U http://dx.doi.org/10.1177/0013164416662960 %R 10.1177/0013164416662960 %0 Journal Article %J Journal of Educational Measurement %D 2017 %T Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing %A Kang, Hyeon-Ah %A Zhang, Susu %A Chang, Hua-Hua %X The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen-Shannon (JS) divergence, a symmetrized version of the Kullback-Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. %B Journal of Educational Measurement %V 54 %P 165–183 %U http://dx.doi.org/10.1111/jedm.12139 %R 10.1111/jedm.12139 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T An Imputation Approach to Handling Incomplete Computerized Tests %A Troy Chen %A Chi-Yu Huang %A Chunyan Liu %K CAT %K imputation approach %K incomplete computerized test %X

As technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability.  Nevertheless, examinees might devise testing strategies to use CAT to their advantage.  For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.

The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ).  Next, the expected number correct score (EXR) was obtained using  and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on   and the item parameters ( ) of an unanswered item, the likelihood of a correct response,  , is computed and employed to estimate the imputed score ( ) for the unanswered item.

Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%.  The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T).  Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).

The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine.  To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Item Parameter Drifting and Online Calibration %A Hua-Hua Chang %A Rui Guo %K online calibration %K Parameter Drift %X

Item calibration is a part of the most important topics in item response theory (IRT). Since many largescale testing programs have switched from paper and pencil (P&P) testing mode to computerized adaptive testing (CAT) mode, developing methods for efficiently calibrating new items have become vital. Among many proposed item calibration processes in CAT, online calibration is the most cost-effective. This presentation introduces an online (re)calibration design to detect item parameter drift for computerized adaptive testing (CAT) in both unidimensional and multidimensional environments. Specifically, for online calibration optimal design in unidimensional computerized adaptive testing model, a two-stage design is proposed by implementing a proportional density index algorithm. For a multidimensional computerized adaptive testing model, a four-quadrant online calibration pretest item selection design with proportional density index algorithm is proposed. Comparisons were made between different online calibration item selection strategies. Results showed that under unidimensional computerized adaptive testing, the proposed modified two-stage item selection criterion with the proportional density algorithm outperformed the other existing methods in terms of item parameter calibration and item parameter drift detection, and under multidimensional computerized adaptive testing, the online (re)calibration technique with the proposed four-quadrant item selection design with proportional density index outperformed other methods.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Item Selection Strategies for Developing CAT in Indonesia %A Istiani Chandra %K CAT %K Indonesia %K item selection strategies %X

Recently, development of computerized testing in Indonesia is quiet promising for the future. Many government institutions used the technology for recruitment. Starting from Indonesian Army acknowledged the benefits of computerized adaptive testing (CAT) over conventional test administration, ones of the issues of selection the first item have taken place of attention. Due to CAT’s basic philosophy, several methods can be used to select the first item such as educational level, ability estimation from item simulation, or other methods. In this case, the question is remains how apply the methods most effective in the context of constrained adaptive testing. This paper reviews such strategies that appeared in the relevant literature. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item selection strategies for dichotomous scoring. In this paper, also discusses the strength and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed in term of learning in the newcomer context, a wide view of first item selection strategies.

 

%B IACAT 2017 Conference %I Niiagata Seiryo University %C Niigata Japan %8 08/2017 %G eng %U https://www.youtube.com/watch?v=2KuFrRATq9Q %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A Large-Scale Progress Monitoring Application with Computerized Adaptive Testing %A Okan Bulut %A Damien Cormier %K CAT %K Large-Scale tests %K Process monitoring %X

Many conventional assessment tools are available to teachers in schools for monitoring student progress in a formative manner. The outcomes of these assessment tools are essential to teachers’ instructional modifications and schools’ data-driven educational strategies, such as using remedial activities and planning instructional interventions for students with learning difficulties. When measuring student progress toward instructional goals or outcomes, assessments should be not only considerably precise but also sensitive to individual change in learning. Unlike conventional paper-pencil assessments that are usually not appropriate for every student, computerized adaptive tests (CATs) are highly capable of estimating growth consistently with minimum and consistent error. Therefore, CATs can be used as a progress monitoring tool in measuring student growth.

This study focuses on an operational CAT assessment that has been used for measuring student growth in reading during the academic school year. The sample of this study consists of nearly 7 million students from the 1st grade to the 12th grade in the US. The students received a CAT-based reading assessment periodically during the school year. The purpose of these periodical assessments is to measure the growth in students’ reading achievement and identify the students who may need additional instructional support (e.g., academic interventions). Using real data, this study aims to address the following research questions: (1) How many CAT administrations are necessary to make psychometrically sound decisions about the need for instructional changes in the classroom or when to provide academic interventions?; (2) What is the ideal amount of time between CAT administrations to capture student growth for the purpose of producing meaningful decisions from assessment results?

To address these research questions, we first used the Theil-Sen estimator for robustly fitting a regression line to each student’s test scores obtained from a series of CAT administrations. Next, we used the conditional standard error of measurement (cSEM) from the CAT administrations to create an error band around the Theil-Sen slope (i.e., student growth rate). This process resulted in the normative slope values across all the grade levels. The optimal number of CAT administrations was established from grade-level regression results. The amount of time needed for progress monitoring was determined by calculating the amount of time required for a student to show growth beyond the median cSEM value for each grade level. The results showed that the normative slope values were the highest for lower grades and declined steadily as grade level increased. The results also suggested that the CAT-based reading assessment is most useful for grades 1 through 4, since most struggling readers requiring an intervention appear to be within this grade range. Because CAT yielded very similar cSEM values across administrations, the amount of error in the progress monitoring decisions did not seem to depend on the number of CAT administrations.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1uGbCKenRLnqTxImX1fZicR2c7GRV6Udc %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T MHK-MST Design and the Related Simulation Study %A Ling Yuyu %A Zhou Chenglin %A Ren Jie %K language testing %K MHK %K multistage testing %X

The MHK is a national standardized exam that tests and rates Chinese language proficiency. It assesses non-native Chinese minorities’ abilities in using the Chinese language in their daily, academic and professional lives; Computerized multistage adaptive testing (MST) is a combination of conventional paper-and-pencil (P&P) and item level computerized adaptive test (CAT), it is a kind of test forms based on computerized technology, take the item set as the scoring unit. It can be said that, MST estimate the Ability extreme value more accurate than conventional paper-and-pencil (P&P), also used the CAT auto-adapted characteristic to reduce the examination length and the score time of report. At present, MST has used in some large test, like Uniform CPA Examination and Graduate Record Examination(GRE). Therefore, it is necessary to develop the MST of application in China.

Based on consideration of the MHK characteristics and its future development, the researchers start with design of MHK-MST. This simulation study is conducted to validate the performance of the MHK -MST system. Real difficulty parameters of MHK items and the simulated ability parameters of the candidates are used to generate the original score matrix and the item modules are delivered to the candidates following the adaptive procedures set according to the path rules. This simulation study provides a sound basis for the implementation of MHK-MST.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Challenges (With Solutions) and Innovative Applications of CAT %A Chun Wang %A David J. Weiss %A Xue Zhang %A Jian Tao %A Yinhong He %A Ping Chen %A Shiyu Wang %A Susu Zhang %A Haiyan Lin %A Xiaohong Gao %A Hua-Hua Chang %A Zhuoran Shang %K CAT %K challenges %K innovative applications %X

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions %A Bor-Chen Kuo %A Chun-Hua Chen %K CD-CAT %K Misconceptions %K Simultaneous diagnosis %X

In education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650

Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Results on Bias in Estimates due to Discontinue Rules in Intelligence Testing %A Matthias von Davier %A Youngmi Cho %A Tianshu Pan %K Bias %K CAT %K Intelligence Testing %X

The presentation provides new results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty, and the presentation of items is adaptive in the sense that each subtest session is discontinued once a test taker produces a certain number of incorrect responses in sequence. The subsequent (not observed) responses are commonly scored as wrong for that subtest, even though the test taker has not seen these. Discontinuation rules allow a certain form of adaptiveness both in paper-based and computerbased testing, and help reducing testing time.

Two lines of research that are relevant are studies that directly assess the impact of discontinuation rules, and studies that more broadly look at the impact of scoring rules on test results with a large number of not administered or not reached items. He & Wolf (2012) compared different ability estimation methods for this type of discontinuation rule adaptation of test length in a simulation study. However, to our knowledge there has been no rigorous analytical study of the underlying distributional changes of the response variables under discontinuation rules. It is important to point out that the results obtained by He & Wolf (2012) agree with results presented by, for example, DeAyala, Plake & Impara (2001) as well as Rose, von Davier & Xu (2010) and Rose, von Davier & Nagengast (2016) in that ability estimates are biased most when scoring the not observed responses as wrong. Discontinuation rules combined with scoring the non-administered items as wrong is used operationally in several major intelligence tests, so more research is needed in order to improve this particular type of adaptiveness in the testing practice.

The presentation extends existing research on adaptiveness by discontinue-rules in intelligence tests in multiple ways: First, a rigorous analytical study of the distributional properties of discontinue-rule scored items is presented. Second, an extended simulation is presented that includes additional alternative scoring rules as well as bias-corrected ability estimators that may be suitable to improve results for discontinue-rule scored intelligence tests.

References: DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234.

He, W. & Wolfe, E. W. (2012). Treatment of Not-Administered Items on Individually Administered Intelligence Tests. Educational and Psychological Measurement, Vol 72, Issue 5, pp. 808 – 826. DOI: 10.1177/0013164412441937

Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.

Rose, N., von Davier, M., & Nagengast, B. (2016) Modeling omitted and not-reached items in irt models. Psychometrika. doi:10.1007/s11336-016-9544-7

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Scripted On-the-fly Multistage Testing %A Edison Choe %A Bruce Williams %A Sung-Hyuck Lee %K CAT %K multistage testing %K On-the-fly testing %X

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Jian Tao %A Hua-Hua Chang %A Ning-Zhong Shi %X

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Using Bayesian Decision Theory in Cognitive Diagnosis Computerized Adaptive Testing %A Chia-Ling Hsu %A Wen-Chung Wang %A ShuYing Chen %K Bayesian Decision Theory %K CD-CAT %X

Cognitive diagnosis computerized adaptive testing (CD-CAT) purports to provide each individual a profile about the strengths and weaknesses of attributes or skills with computerized adaptive testing. In the CD-CAT literature, researchers dedicated to evolving item selection algorithms to improve measurement efficiency, and most algorithms were developed based on information theory. By the discontinuous nature of the latent variables in CD-CAT, this study introduced an alternative for item selection, called the minimum expected cost (MEC) method, which was derived based on Bayesian decision theory. Using simulations, the MEC method was evaluated against the posterior weighted Kullback-Leibler (PWKL) information, the modified PWKL (MPWKL), and the mutual information (MI) methods by manipulating item bank quality, item selection algorithm, and termination rule. Results indicated that, regardless of item quality and termination criterion, the MEC, MPWKL, and MI methods performed very similarly and they all outperformed the PWKL method in classification accuracy and test efficiency, especially in short tests; the MEC method had more efficient item bank usage than the MPWKL and MI methods. Moreover, the MEC method could consider the costs of incorrect decisions and improve classification accuracy and test efficiency when a particular profile was of concern. All the results suggest the practicability of the MEC method in CD-CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata Japan %8 08/2017 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Bayesian Networks in Educational Assessment: The State of the Field %A Culbertson, Michael J. %X Bayesian networks (BN) provide a convenient and intuitive framework for specifying complex joint probability distributions and are thus well suited for modeling content domains of educational assessments at a diagnostic level. BN have been used extensively in the artificial intelligence community as student models for intelligent tutoring systems (ITS) but have received less attention among psychometricians. This critical review outlines the existing research on BN in educational assessment, providing an introduction to the ITS literature for the psychometric community, and points out several promising research paths. The online appendix lists 40 assessment systems that serve as empirical examples of the use of BN for educational assessment in a variety of domains. %B Applied Psychological Measurement %V 40 %P 3-21 %U http://apm.sagepub.com/content/40/1/3.abstract %R 10.1177/0146621615590401 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2016 %T Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context %A Colvin, Kimberly %A Keller, Lisa A %A Robin, Frederic %K Adaptive Testing %K automatic item generation %K errors in item parameters %K item clones %K multistage testing %B Journal of Computerized Adaptive Testing %V 4 %P 1-18 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/59/27 %N 1 %R 10.7333/1608-040101 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T High-Efficiency Response Distribution–Based Item Selection Algorithms for Short-Length Cognitive Diagnostic Computerized Adaptive Testing %A Zheng, Chanjin %A Chang, Hua-Hua %X Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to obtain useful diagnostic information with great efficiency brought by CAT technology. Most of the existing CD-CAT item selection algorithms are evaluated when test length is fixed and relatively long, but some applications of CD-CAT, such as in interim assessment, require to obtain the cognitive pattern with a short test. The mutual information (MI) algorithm proposed by Wang is the first endeavor to accommodate this need. To reduce the computational burden, Wang provided a simplified scheme, but at the price of scale/sign change in the original index. As a result, it is very difficult to combine it with some popular constraint management methods. The current study proposes two high-efficiency algorithms, posterior-weighted cognitive diagnostic model (CDM) discrimination index (PWCDI) and posterior-weighted attribute-level CDM discrimination index (PWACDI), by modifying the CDM discrimination index. They can be considered as an extension of the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) methods. A pre-calculation strategy has also been developed to address the computational issue. Simulation studies indicate that the newly developed methods can produce results comparable with or better than the MI and PWKL in both short and long tests. The other major advantage is that the computational issue has been addressed more elegantly than MI. PWCDI and PWACDI can run as fast as PWKL. More importantly, they do not suffer from the problem of scale/sign change as MI and, thus, can be used with constraint management methods together in a straightforward manner. %B Applied Psychological Measurement %V 40 %P 608-624 %U http://apm.sagepub.com/content/40/8/608.abstract %R 10.1177/0146621616665196 %0 Journal Article %J Journal of Educational Measurement %D 2016 %T Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design %A Wang, Shiyu %A Lin, Haiyan %A Chang, Hua-Hua %A Douglas, Jeff %X Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. %B Journal of Educational Measurement %V 53 %P 45–62 %U http://dx.doi.org/10.1111/jedm.12100 %R 10.1111/jedm.12100 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Optimal Reassembly of Shadow Tests in CAT %A Choi, Seung W. %A Moellering, Karin T. %A Li, Jie %A van der Linden, Wim J. %X Even in the age of abundant and fast computing resources, concurrency requirements for large-scale online testing programs still put an uninterrupted delivery of computer-adaptive tests at risk. In this study, to increase the concurrency for operational programs that use the shadow-test approach to adaptive testing, we explored various strategies aiming for reducing the number of reassembled shadow tests without compromising the measurement quality. Strategies requiring fixed intervals between reassemblies, a certain minimal change in the interim ability estimate since the last assembly before triggering a reassembly, and a hybrid of the two strategies yielded substantial reductions in the number of reassemblies without degradation in the measurement accuracy. The strategies effectively prevented unnecessary reassemblies due to adapting to the noise in the early test stages. They also highlighted the practicality of the shadow-test approach by minimizing the computational load involved in its use of mixed-integer programming. %B Applied Psychological Measurement %V 40 %P 469-485 %U http://apm.sagepub.com/content/40/7/469.abstract %R 10.1177/0146621616654597 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence Measures %A Kang, Hyeon-Ah %A Chang, Hua-Hua %X An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis. %B Applied Psychological Measurement %V 40 %P 534-550 %U http://apm.sagepub.com/content/40/7/534.abstract %R 10.1177/0146621616663676 %0 Journal Article %J Journal of Educational Measurement %D 2015 %T Assessing Individual-Level Impact of Interruptions During Online Testing %A Sinharay, Sandip %A Wan, Ping %A Choi, Seung W. %A Kim, Dong-In %X With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual-level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions. %B Journal of Educational Measurement %V 52 %P 80–105 %U http://dx.doi.org/10.1111/jedm.12064 %R 10.1111/jedm.12064 %0 Journal Article %J Educational and Psychological Measurement %D 2015 %T a-Stratified Computerized Adaptive Testing in the Presence of Calibration Error %A Cheng, Ying %A Patton, Jeffrey M. %A Shao, Can %X a-Stratified computerized adaptive testing with b-blocking (AST), as an alternative to the widely used maximum Fisher information (MFI) item selection method, can effectively balance item pool usage while providing accurate latent trait estimates in computerized adaptive testing (CAT). However, previous comparisons of these methods have treated item parameter estimates as if they are the true population parameter values. Consequently, capitalization on chance may occur. In this article, we examined the performance of the AST method under more realistic conditions where item parameter estimates instead of true parameter values are used in the CAT. Its performance was compared against that of the MFI method when the latter is used in conjunction with Sympson–Hetter or randomesque exposure control. Results indicate that the MFI method, even when combined with exposure control, is susceptible to capitalization on chance. This is particularly true when the calibration sample size is small. On the other hand, AST is more robust to capitalization on chance. Consistent with previous investigations using true item parameter values, AST yields much more balanced item pool usage, with a small loss in the precision of latent trait estimates. The loss is negligible when the test is as long as 40 items. %B Educational and Psychological Measurement %V 75 %P 260-283 %U http://epm.sagepub.com/content/75/2/260.abstract %R 10.1177/0013164414530719 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing %A Cheng, Ying %A Liu, Cheng %X In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed. %B Applied Psychological Measurement %V 39 %P 551-565 %U http://apm.sagepub.com/content/39/7/551.abstract %R 10.1177/0146621615585850 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T Online Item Calibration for Q-Matrix in CD-CAT %A Chen, Yunxiao %A Liu, Jingchen %A Ying, Zhiliang %X

Item replenishment is important for maintaining a large-scale item bank. In this article, the authors consider calibrating new items based on pre-calibrated operational items under the deterministic inputs, noisy-and-gate model, the specification of which includes the so-called -matrix, as well as the slipping and guessing parameters. Making use of the maximum likelihood and Bayesian estimators for the latent knowledge states, the authors propose two methods for the calibration. These methods are applicable to both traditional paper–pencil–based tests, for which the selection of operational items is prefixed, and computerized adaptive tests, for which the selection of operational items is sequential and random. Extensive simulations are done to assess and to compare the performance of these approaches. Extensions to other diagnostic classification models are also discussed.

%B Applied Psychological Measurement %V 39 %P 5-15 %U http://apm.sagepub.com/content/39/1/5.abstract %R 10.1177/0146621613513065 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T On-the-Fly Assembled Multistage Adaptive Testing %A Zheng, Yi %A Chang, Hua-Hua %X

Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, “on-the-fly assembled multistage adaptive testing” (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the “Discussion” section provides suggestions on possible future adaptive testing designs based on the OMST framework, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.

%B Applied Psychological Measurement %V 39 %P 104-118 %U http://apm.sagepub.com/content/39/2/104.abstract %R 10.1177/0146621614544519 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2014 %T Cognitive Diagnostic Models and Computerized Adaptive Testing: Two New Item-Selection Methods That Incorporate Response Times %A Finkelman, M. D. %A Kim, W. %A Weissman, A. %A Cook, R.J. %B Journal of Computerized Adaptive Testing %V 2 %P 59-76 %G English %U http://www.iacat.org/jcat/index.php/jcat/article/view/43/21 %N 4 %R 10.7333/1412-0204059 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Computerized Adaptive Testing for the Random Weights Linear Logistic Test Model %A Crabbe, Marjolein %A Vandebroek, Martina %X

This article discusses four-item selection rules to design efficient individualized tests for the random weights linear logistic test model (RWLLTM): minimum posterior-weighted -error minimum expected posterior-weighted -error maximum expected Kullback–Leibler divergence between subsequent posteriors (KLP), and maximum mutual information (MUI). The RWLLTM decomposes test items into a set of subtasks or cognitive features and assumes individual-specific effects of the features on the difficulty of the items. The model extends and improves the well-known linear logistic test model in which feature effects are only estimated at the aggregate level. Simulations show that the efficiencies of the designs obtained with the different criteria appear to be equivalent. However, KLP and MUI are given preference over and due to their lesser complexity, which significantly reduces the computational burden.

%B Applied Psychological Measurement %V 38 %P 415-431 %U http://apm.sagepub.com/content/38/6/415.abstract %R 10.1177/0146621614533987 %0 Journal Article %J Journal of Educational Measurement %D 2014 %T Determining the Overall Impact of Interruptions During Online Testing %A Sinharay, Sandip %A Wan, Ping %A Whitaker, Mike %A Kim, Dong-In %A Zhang, Litong %A Choi, Seung W. %X

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress-Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test.

%B Journal of Educational Measurement %V 51 %P 419–440 %U http://dx.doi.org/10.1111/jedm.12052 %R 10.1111/jedm.12052 %0 Journal Article %J Journal of Educational Measurement %D 2014 %T An Enhanced Approach to Combine Item Response Theory With Cognitive Diagnosis in Adaptive Testing %A Wang, Chun %A Zheng, Chanjin %A Chang, Hua-Hua %X

Computerized adaptive testing offers the possibility of gaining information on both the overall ability and cognitive profile in a single assessment administration. Some algorithms aiming for these dual purposes have been proposed, including the shadow test approach, the dual information method (DIM), and the constraint weighted method. The current study proposed two new methods, aggregate ranked information index (ARI) and aggregate standardized information index (ASI), which appropriately addressed the noncompatibility issue inherent in the original DIM method. More flexible weighting schemes that put different emphasis on information about general ability (i.e., θ in item response theory) and information about cognitive profile (i.e., α in cognitive diagnostic modeling) were also explored. Two simulation studies were carried out to investigate the effectiveness of the new methods and weighting schemes. Results showed that the new methods with the flexible weighting schemes could produce more accurate estimation of both overall ability and cognitive profile than the original DIM. Among them, the ASI with both empirical and theoretical weights is recommended, and attribute-level weighting scheme is preferred if some attributes are considered more important from a substantive perspective.

%B Journal of Educational Measurement %V 51 %P 358–380 %U http://dx.doi.org/10.1111/jedm.12057 %R 10.1111/jedm.12057 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests %A Park, Ryoungsun %A Kim, Jiseon %A Chung, Hyewon %A Dodd, Barbara G. %X

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

%B Applied Psychological Measurement %V 38 %P 268-280 %U http://apm.sagepub.com/content/38/4/268.abstract %R 10.1177/0146621613515545 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T General Test Overlap Control: Improved Algorithm for CAT and CCT %A Chen, Shu-Ying %A Lei, Pui-Wa %A Chen, Jyun-Hong %A Liu, Tzu-Chen %X

This article proposed a new online test overlap control algorithm that is an improvement of Chen’s algorithm in controlling general test overlap rate for item pooling among a group of examinees. Chen’s algorithm is not very efficient in that not only item pooling between current examinee and prior examinees is controlled for but also item pooling between previous examinees, which would have been controlled for when they were current examinees. The proposed improvement increases efficiency by only considering item pooling between current and previous examinees, and its improved performance over Chen is demonstrated in a simulated computerized adaptive testing (CAT) environment. Moreover, the proposed algorithm is adapted for computerized classification testing (CCT) using the sequential probability ratio test procedure and is evaluated against some existing exposure control procedures. The proposed algorithm appears to work best in controlling general test overlap rate among the exposure control procedures examined without sacrificing much classification precision, though longer tests might be required for more stringent control of item pooling among larger groups. Given the capability of the proposed algorithm in controlling item pooling among a group of examinees of any size and its ease of implementation, it appears to be a good test overlap control method.

%B Applied Psychological Measurement %V 38 %P 229-244 %U http://apm.sagepub.com/content/38/3/229.abstract %R 10.1177/0146621613513494 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T A Numerical Investigation of the Recovery of Point Patterns With Minimal Information %A Cox, M. A. A. %X

A method has been proposed (Tsogo et al. 2001) in order to reconstruct the geometrical configuration of a large point set using minimal information. This paper employs numerical examples to investigate the proposed procedure. The suggested method has two great advantages. It reduces the volume of the data collection exercise and eases the computational effort involved in analyzing the data. It is suggested, however, that the method while possibly providing a useful starting point for a solution, does not provide a universal panacea.

%B Applied Psychological Measurement %V 38 %P 329-335 %U http://apm.sagepub.com/content/38/4/329.abstract %R 10.1177/0146621613516186 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Deriving Stopping Rules for Multidimensional Computerized Adaptive Testing %A Wang, Chun %A Chang, Hua-Hua %A Boughton, Keith A. %X

Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee’s performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose abilities are quite different from the average difficulty level of the item bank when there are only a limited number of items in the item bank. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. Specifically, this research derives four stopping rules that either quantify the measurement precision of the ability vector (i.e., minimum determinant rule [D-rule], minimum eigenvalue rule [E-rule], and maximum trace rule [T-rule]) or quantify the amount of available information carried by each item (i.e., maximum Kullback–Leibler divergence rule [K-rule]). The simulation results showed that all four stopping rules successfully terminated the test when the mean squared error of ability estimation is within a desired range, regardless of examinees’ true abilities. It was found that when using the D-, E-, or T-rule, examinees with extreme abilities tended to have tests that were twice as long as the tests received by examinees with moderate abilities. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution.

%B Applied Psychological Measurement %V 37 %P 99-122 %U http://apm.sagepub.com/content/37/2/99.abstract %R 10.1177/0146621612463422 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Estimating Measurement Precision in Reduced-Length Multi-Stage Adaptive Testing %A Crotts, K.M. %A Zenisky, A. L. %A Sireci, S.G. %A Li, X. %B Journal of Computerized Adaptive Testing %V 1 %P 67-87 %G English %N 4 %R 10.7333/1309-0104067 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2013 %T A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing %A Wang, Chun %A Fan, Zhewen %A Chang, Hua-Hua %A Douglas, Jeffrey A. %X

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

%B Journal of Educational and Behavioral Statistics %V 38 %P 381-417 %U http://jeb.sagepub.com/cgi/content/abstract/38/4/381 %R 10.3102/1076998612461831 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Variable-Length Computerized Adaptive Testing Based on Cognitive Diagnosis Models %A Hsu, Chia-Ling %A Wang, Wen-Chung %A Chen, Shu-Ying %X

Interest in developing computerized adaptive testing (CAT) under cognitive diagnosis models (CDMs) has increased recently. CAT algorithms that use a fixed-length termination rule frequently lead to different degrees of measurement precision for different examinees. Fixed precision, in which the examinees receive the same degree of measurement precision, is a major advantage of CAT over nonadaptive testing. In addition to the precision issue, test security is another important issue in practical CAT programs. In this study, the authors implemented two termination criteria for the fixed-precision rule and evaluated their performance under two popular CDMs using simulations. The results showed that using the two criteria with the posterior-weighted Kullback–Leibler information procedure for selecting items could achieve the prespecified measurement precision. A control procedure was developed to control item exposure and test overlap simultaneously among examinees. The simulation results indicated that in contrast to no method of controlling exposure, the control procedure developed in this study could maintain item exposure and test overlap at the prespecified level at the expense of only a few more items.

%B Applied Psychological Measurement %V 37 %P 563-582 %U http://apm.sagepub.com/content/37/7/563.abstract %R 10.1177/0146621613488642 %0 Journal Article %J BMC Med Res Methodol %D 2012 %T Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study. %A Riley, Barth B %A Carle, Adam C %K Bayes Theorem %K Data Interpretation, Statistical %K Humans %K Mathematical Computing %K Monte Carlo Method %K Outcome Assessment (Health Care) %X

BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.

METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.

RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF.

CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.

%B BMC Med Res Methodol %V 12 %P 124 %8 2012 %G eng %R 10.1186/1471-2288-12-124 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models %A Huang, Hung-Yu %A Chen, Po-Hsi %A Wang, Wen-Chung %X

In the human sciences, a common assumption is that latent traits have a hierarchical structure. Higher order item response theory models have been developed to account for this hierarchy. In this study, computerized adaptive testing (CAT) algorithms based on these kinds of models were implemented, and their performance under a variety of situations was examined using simulations. The results showed that the CAT algorithms were very effective. The progressive method for item selection, the Sympson and Hetter method with online and freeze procedure for item exposure control, and the multinomial model for content balancing can simultaneously maintain good measurement precision, item exposure control, content balance, test security, and pool usage.

%B Applied Psychological Measurement %V 36 %P 689-706 %U http://apm.sagepub.com/content/36/8/689.abstract %R 10.1177/0146621612459552 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing %A Yen, Yung-Chin %A Ho, Rong-Guey %A Laio, Wen-Wei %A Chen, Li-Ju %A Kuo, Ching-Chin %X

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

%B Applied Psychological Measurement %V 36 %P 75-87 %U http://apm.sagepub.com/content/36/2/75.abstract %R 10.1177/0146621611432862 %0 Journal Article %J Journal of Educational Measurement %D 2012 %T Investigating the Effect of Item Position in Computer-Based Tests %A Li, Feiming %A Cohen, Allan %A Shen, Linjun %X

Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to data from a CBT administration of a medical licensure examination. The models specifically were used to investigate whether item position affected item difficulty and item intensity estimates. Results indicated that the position effect was negligible.

%B Journal of Educational Measurement %V 49 %P 362–379 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00181.x %R 10.1111/j.1745-3984.2012.00181.x %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Mixture Rasch Model–Based Computerized Adaptive Test for Latent Class Identification %A Hong Jiao, %A Macready, George %A Liu, Junhui %A Cho, Youngmi %X

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was large, all item selection methods did not differ evidently in terms of accuracy in classifying examinees into different latent classes and estimating latent ability. However, when item separation was small, two methods with class-specific ability estimates performed better than the other two methods based on a single latent ability estimate across all latent classes. The three types of KL information distributions were compared. The KL and the reversed KL information could be the same or different depending on the ability level and the item difficulty difference between latent classes. Although the KL information and the reversed KL information were different at some ability levels and item difficulty difference levels, the use of the KL, the reversed KL, or the adaptive KL information did not affect the results substantially due to the symmetric distribution of item difficulty differences between latent classes in the simulated item pools. Item pool usage and classification convergence points were examined as well.

%B Applied Psychological Measurement %V 36 %P 469-493 %U http://apm.sagepub.com/content/36/6/469.abstract %R 10.1177/0146621612450068 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Panel Design Variations in the Multistage Test Using the Mixed-Format Tests %A Kim, Jiseon %A Chung, Hyewon %A Dodd, Barbara G. %A Park, Ryoungsun %X

This study compared various panel designs of the multistage test (MST) using mixed-format tests in the context of classification testing. Simulations varied the design of the first-stage module. The first stage was constructed according to three levels of test information functions (TIFs) with three different TIF centers. Additional computerized adaptive test (CAT) conditions provided baseline comparisons. Three passing rate conditions were also included. The various MST conditions using mixed-format tests were constructed properly and performed well. When the levels of TIFs at the first stage were higher, the simulations produced a greater number of correct classifications. CAT with the randomesque-10 procedure yielded comparable results to the MST with increased levels of TIFs. Finally, all MST conditions achieved better test security results compared with CAT’s maximum information conditions.

%B Educational and Psychological Measurement %V 72 %P 574-588 %U http://epm.sagepub.com/content/72/4/574.abstract %R 10.1177/0013164411428977 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing %A Davison, Mark L. %A Semmes, Robert %A Huang, Lan %A Close, Catherine N. %X

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

%B Educational and Psychological Measurement %V 72 %P 245-263 %U http://epm.sagepub.com/content/72/2/245.abstract %R 10.1177/0013164411408412 %0 Journal Article %J Psychiatry Research %D 2011 %T Applying computerized adaptive testing to the CES-D scale: A simulation study %A Smits, N. %A Cuijpers, P. %A van Straten, A. %B Psychiatry Research %V 188 %P 147–155 %N 1 %0 Journal Article %J Psychiatry Research %D 2011 %T Applying computerized adaptive testing to the CES-D scale: A simulation study %A Smits, N. %A Cuijpers, P. %A van Straten, A. %X In this paper we studied the appropriateness of developing an adaptive version of the Center of Epidemiological Studies-Depression (CES-D, Radloff, 1977) scale. Computerized Adaptive Testing (CAT) involves the computerized administration of a test in which each item is dynamically selected from a pool of items until a pre-specified measurement precision is reached. Two types of analyses were performed using the CES-D responses of a large sample of adolescents (N=1392). First, it was shown that the items met the psychometric requirements needed for CAT. Second, CATs were simulated by using the existing item responses as if they had been collected adaptively. CATs selecting only a small number of items gave results which, in terms of depression measurement and criterion validity, were only marginally different from the results of full CES-D assessment. It was concluded that CAT is a very fruitful way of improving the efficiency of the CES-D questionnaire. The discussion addresses the strengths and limitations of the application of CAT in mental health research. %B Psychiatry Research %7 2011/01/07 %8 Jan 3 %@ 0165-1781 (Print)0165-1781 (Linking) %G Eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Building Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment %A Chang, Hua-Hua %K affordability %K CAT %K cost %B Annual Conference of the International Association for Computerized Adaptive Testing %G eng %0 Journal Article %J International Journal of Testing %D 2011 %T Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model %A Stark, Stephen %A Chernyshenko, Oleksandr S. %B International Journal of Testing %V 11 %P 231-247 %U http://www.tandfonline.com/doi/abs/10.1080/15305058.2011.561459 %R 10.1080/15305058.2011.561459 %0 Generic %D 2011 %T Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients %A Giesinger, J. M. %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Arraras, J. I. %A Conroy, T. %A Gamper, E. M. %A Kemmler, G. %A King, M. T. %A Oberguggenberger, A. S. %A Velikova, G. %A Young, T. %A Holzner, B. %A Eortc-Qlg, E. O. %X ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. %B Health and Quality of Life Outcomes %7 2011/03/31 %V 9 %P 10 %8 March 29, 2011 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G Eng %M 21447160 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Detecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study %A Barth B. Riley %A Adam C. Carle %K 95% Credible Interval %K CAT %K DIF %K differential item function %K modified robust Z statistic %K Monte Carlo methodologies %X

A comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T A Heuristic Of CAT Item Selection Procedure For Testlets %A Yuehmei Chien %A David Shin %A Walter Denny Way %K CAT %K shadow test %K testlets %B Annual Conference of the International Association for Computerized Adaptive Testing %G eng %0 Generic %D 2011 %T Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger %A Pilkonis, P. A. %A Choi, S. W. %A Reise, S. P. %A Stover, A. M. %A Riley, W. T. %A Cella, D. %B Assessment %@ 1073-1911 %G eng %& June 21, 2011 %0 Journal Article %J BMC Medical Informatics and Decision Making %D 2011 %T A new adaptive testing algorithm for shortening health literacy assessments %A Kandula, S. %A Ancker, J.S. %A Kaufman, D.R. %A Currie, L.M. %A Qing, Z.-T. %X

 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178473/?tool=pmcentrez
%B BMC Medical Informatics and Decision Making %V 11 %G English %N 52 %R 10.1186/1472-6947-11-52 %0 Journal Article %J Educational and Psychological Measurement %D 2011 %T A New Stopping Rule for Computerized Adaptive Testing %A Choi, Seung W. %A Grady, Matthew W. %A Dodd, Barbara G. %X

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

%B Educational and Psychological Measurement %V 71 %P 37-53 %U http://epm.sagepub.com/content/71/1/37.abstract %R 10.1177/0013164410387338 %0 Journal Article %J Journal of Educational Measurement %D 2011 %T Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing %A Wang, Chun %A Chang, Hua-Hua %A Huebner, Alan %X

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson-Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.

%B Journal of Educational Measurement %V 48 %P 255–273 %U http://dx.doi.org/10.1111/j.1745-3984.2011.00145.x %R 10.1111/j.1745-3984.2011.00145.x %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A comparison of content-balancing procedures for estimating multiple clinical domains in computerized adaptive testing: Relative precision, validity, and detection of persons with misfitting responses %A Riley, B. B. %A Dennis, M. L. %A Conrad, K. J. %X This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target numberof items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.Keywords %B Applied Psychological Measurement %V 34 %P 410-423 %@ 0146-62161552-3497 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A Comparison of Content-Balancing Procedures for Estimating Multiple Clinical Domains in Computerized Adaptive Testing: Relative Precision, Validity, and Detection of Persons With Misfitting Responses %A Barth B. Riley %A Michael L. Dennis %A Conrad, Kendon J. %X

This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target number of items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.

%B Applied Psychological Measurement %V 34 %P 410-423 %U http://apm.sagepub.com/content/34/6/410.abstract %R 10.1177/0146621609349802 %0 Journal Article %J Educational Technology & Society %D 2010 %T Development and evaluation of a confidence-weighting computerized adaptive testing %A Yen, Y. C. %A Ho, R. G. %A Chen, L. J. %A Chou, K. Y. %A Chen, Y. L. %B Educational Technology & Society %V 13(3) %P 163–176 %G eng %0 Journal Article %J Quality of Life Research %D 2010 %T Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Chie, W. C. %A Conroy, T. %A Costantini, A. %A Fayers, P. %A Helbostad, J. %A Holzner, B. %A Kaasa, S. %A Singer, S. %A Velikova, G. %A Young, T. %X PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. %B Quality of Life Research %7 2010/10/26 %V 20 %P 479-490 %@ 1573-2649 (Electronic)0962-9343 (Linking) %G Eng %M 20972628 %0 Journal Article %J Quality of Life Research %D 2010 %T Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms %A Choi, S. %A Reise, S. P. %A Pilkonis, P. A. %A Hays, R. D. %A Cella, D. %B Quality of Life Research %V 19(1) %P 125–136 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2010 %T A new stopping rule for computerized adaptive testing %A Choi, S. W. %A Grady, M. W. %A Dodd, B. G. %X The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. %B Educational and Psychological Measurement %7 2011/02/01 %V 70 %P 1-17 %8 Dec 1 %@ 0013-1644 (Print)0013-1644 (Linking) %G Eng %M 21278821 %2 3028267 %0 Journal Article %J Psychometrika %D 2010 %T Online calibration via variable length computerized adaptive testing %A Chang, Y. I. %A Lu, H. Y. %X Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration processes being proposed and discussed from both theoretical and practical perspectives. Among them, the online calibration may be one of the most cost effective processes. In this paper, under a variable length computerized adaptive testing scenario, we integrate the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems. The proposed sequential estimate of item parameters is shown to be strongly consistent and asymptotically normally distributed with a prechosen accuracy. Numerical results show that the proposed method is very promising in terms of both estimation accuracy and efficiency. The results of using calibrated items to estimate the latent trait levels are also reported. %B Psychometrika %V 75 %P 140-157 %@ 0033-3123 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A Procedure for Controlling General Test Overlap in Computerized Adaptive Testing %A Chen, Shu-Ying %X

To date, exposure control procedures that are designed to control test overlap in computerized adaptive tests (CATs) are based on the assumption of item sharing between pairs of examinees. However, in practice, examinees may obtain test information from more than one previous test taker. This larger scope of information sharing needs to be considered in conducting test overlap control. The purpose of this study is to propose a test overlap control method such that the proportion of overlapping items encountered by an examinee with a group of previous examinees (described as general test overlap rate) can be controlled. Results indicated that item exposure rate and general test overlap rate could be simultaneously controlled by implementing the procedure. In addition, these two indices were controlled on the fly without any iterative simulations conducted prior to operational CATs. Thus, the proposed procedure would be an efficient method for controlling both the item exposure and general test overlap in CATs.

%B Applied Psychological Measurement %V 34 %P 393-409 %U http://apm.sagepub.com/content/34/6/393.abstract %R 10.1177/0146621610367788 %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing %A Deng, Hui %A Ansley, Timothy %A Chang, Hua-Hua %X

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

%B Journal of Educational Measurement %V 47 %P 202–226 %U http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x %R 10.1111/j.1745-3984.2010.00109.x %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and maximum information item selection procedures in computer adaptive testing %A Deng, H. %A Ansley, T. %A Chang, H.-H. %B Journal of Educational Measurement %V 47 %P 202-226 %G Eng %0 Journal Article %J Journal of Applied Measurement %D 2010 %T The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research %A Gershon, R. C. %A Rothrock, N. %A Hanrahan, R. %A Bass, M. %A Cella, D. %X The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. %B Journal of Applied Measurement %V 11 %P 304-314 %@ 1529-7713 %G eng %0 Generic %D 2010 %T Validation of a computer-adaptive test to evaluate generic health-related quality of life %A Rebollo, P. %A Castejon, I. %A Cuervo, J. %A Villa, G. %A Garcia-Cueto, E. %A Diaz-Cuervo, H. %A Zardain, P. C. %A Muniz, J. %A Alonso, J. %X BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. %B Health and Quality of Life Outcomes %7 2010/12/07 %V 8 %P 147 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 21129169 %2 3022567 %0 Journal Article %J Computers and Education %D 2009 %T An adaptive testing system for supporting versatile educational assessment %A Huang, Y-M. %A Lin, Y-T. %A Cheng, S-C. %K Architectures for educational technology system %K Distance education and telelearning %X With the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions. %B Computers and Education %V 52 %P 53-67 %@ 0360-1315 %G eng %0 Book Section %D 2009 %T Adequacy of an item pool measuring proficiency in English language to implement a CAT procedure %A Karino, C. A. %A Costa, D. R. %A Laros, J. A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Applications of CAT in admissions to higher education in Israel: Twenty-two years of experience %A Gafni, N. %A Cohen, Y. %A Roded, K %A Baumer, M %A Moshinsky, A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T A burdened CAT: Incorporating response burden with maximum Fisher's information for item selection %A Swartz, R.J.. %A Choi, S. W. %X Widely used in various educational and vocational assessment applications, computerized adaptive testing (CAT) has recently begun to be used to measure patient-reported outcomes Although successful in reducing respondent burden, most current CAT algorithms do not formally consider it as part of the item selection process. This study used a loss function approach motivated by decision theory to develop an item selection method that incorporates respondent burden into the item selection process based on maximum Fisher information item selection. Several different loss functions placing varying degrees of importance on respondent burden were compared, using an item bank of 62 polytomous items measuring depressive symptoms. One dataset consisted of the real responses from the 730 subjects who responded to all the items. A second dataset consisted of simulated responses to all the items based on a grid of latent trait scores with replicates at each grid point. The algorithm enables a CAT administrator to more efficiently control the respondent burden without severely affecting the measurement precision than when using MFI alone. In particular, the loss function incorporating respondent burden protected respondents from receiving longer tests when their estimated trait score fell in a region where there were few informative items. %C In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Comparison of adaptive Bayesian estimation and weighted Bayesian estimation in multidimensional computerized adaptive testing %A Chen, P. H. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Comparison of CAT Item Selection Criteria for Polytomous Items %A Choi, Seung W. %A Swartz, Richard J. %X

Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six different selection methods are compared systematically. The results showed no clear benefit from more sophisticated selection criteria and showed one method previously believed to be superior—the maximum expected posterior weighted information (MEPWI)—to be mathematically equivalent to a simpler method, the maximum posterior weighted information (MPWI).

%B Applied Psychological Measurement %V 33 %P 419-440 %U http://apm.sagepub.com/content/33/6/419.abstract %R 10.1177/0146621608327801 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Comparison of CAT item selection criteria for polytomous items %A Choi, S. W. %A Swartz, R.J.. %B Applied Psychological Measurement %V 33 %P 419–440 %G eng %0 Book Section %D 2009 %T A comparison of three methods of item selection for computerized adaptive testing %A Costa, D. R. %A Karino, C. A. %A Moura, F. A. S. %A Andrade, D. F. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Computerized adaptive testing for cognitive diagnosis %A Cheng, Y %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Constraint-Weighted a-Stratification for Computerized Adaptive Testing With Nonstatistical Constraints %A Ying Cheng, %A Chang, Hua-Hua %A Douglas, Jeffrey %A Fanmin Guo, %X

a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.

%B Educational and Psychological Measurement %V 69 %P 35-49 %U http://epm.sagepub.com/content/69/1/35.abstract %R 10.1177/0013164408322030 %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Constraint-weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control %A Cheng, Y %A Chang, Hua-Hua %A Douglas, J. %A Guo, F. %B Educational and Psychological Measurement %V 69 %P 35-49 %G eng %0 Journal Article %J Measurement: Interdisciplinary Research and Perspectives %D 2009 %T Diagnostic classification models and multidimensional adaptive testing: A commentary on Rupp and Templin. %A Frey, A. %A Carstensen, C. H. %B Measurement: Interdisciplinary Research and Perspectives %V 7 %P 58-61 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Firestar: Computerized adaptive testing simulation program for polytomous IRT models %A Choi, S. W. %B Applied Psychological Measurement %V 33 %P 644–645 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Firestar: Computerized adaptive testing simulation program for polytomous IRT models %A Choi, S. W. %B Applied Psychological Measurement %7 2009/12/17 %V 33 %P 644-645 %8 Nov 1 %@ 1552-3497 (Electronic)0146-6216 (Linking) %G Eng %M 20011609 %2 2790213 %0 Book Section %D 2009 %T Kullback-Leibler information in multidimensional adaptive testing: theory and application %A Wang, C. %A Chang, Hua-Hua %X Built on multidimensional item response theory (MIRT), multidimensional adaptive testing (MAT) can, in principle, provide a promising choice to ensuring efficient estimation of each ability dimension in a multidimensional vector. Currently, two item selection procedures have been developed for MAT, one based on Fisher information embedded within a Bayesian framework, and the other powered by Kullback-Leibler (KL) information. It is well-known that in unidimensional IRT that the second derivative of KL information (also termed “global information”) is Fisher information evaluated atθ 0. This paper first generalizes the relationship between these two types of information in two ways—the analytical result is given as well as the graphical representation, to enhance interpretation and understanding. Second, a KL information index is constructed for MAT, which represents the integration of KL nformation over all of the ability dimensions. This paper further discusses how this index correlates with the item discrimination parameters. The analytical results would lay foundation for future development of item selection methods in MAT which can help equalize the item exposure rate. Finally, a simulation study is conducted to verify the above results. The connection between the item parameters, item KL information, and item exposure rate is demonstrated for empirical MAT delivered by an item bank calibrated under two-dimensional IRT. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2009 %T The maximum priority index method for severely constrained item selection in computerized adaptive testing %A Cheng, Y %A Chang, Hua-Hua %K Aptitude Tests/*statistics & numerical data %K Diagnosis, Computer-Assisted/*statistics & numerical data %K Educational Measurement/*statistics & numerical data %K Humans %K Mathematical Computing %K Models, Statistical %K Personality Tests/*statistics & numerical data %K Psychometrics/*statistics & numerical data %K Reproducibility of Results %K Software %X This paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision. %B British Journal of Mathematical and Statistical Psychology %7 2008/06/07 %V 62 %P 369-83 %8 May %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 18534047 %0 Book Section %D 2009 %T Obtaining reliable diagnostic information through constrained CAT %A Wang, C. %A Chang, Hua-Hua %A Douglas, J. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Optimizing item exposure control algorithms for polytomous computerized adaptive tests with restricted item banks %A Chajewski, M. %A Lewis, C. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Rheumatology %D 2009 %T Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing %A Fries, J.F. %A Cella, D. %A Rose, M. %A Krishnan, E. %A Bruce, B. %K *Disability Evaluation %K *Outcome Assessment (Health Care) %K Arthritis/diagnosis/*physiopathology %K Health Surveys %K Humans %K Prognosis %K Reproducibility of Results %X OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. %B Journal of Rheumatology %7 2009/09/10 %V 36 %P 2061-2066 %8 Sep %@ 0315-162X (Print)0315-162X (Linking) %G eng %M 19738214 %0 Journal Article %J Health and Quality of Life Outcomes %D 2009 %T Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation %A Chien, T. W. %A Wu, H. M. %A Wang, W-C. %A Castillo, R. V. %A Chou, W. %K *Activities of Daily Living %K *Computer Graphics %K *Computer Simulation %K *Diagnosis, Computer-Assisted %K Female %K Humans %K Male %K Point-of-Care Systems %K Reproducibility of Results %K Stroke/*rehabilitation %K Taiwan %K United States %X BACKGROUND: The aim of this study was to verify the effectiveness and efficacy of saving time and reducing burden for patients, nurses, and even occupational therapists through computer adaptive testing (CAT). METHODS: Based on an item bank of the Barthel Index (BI) and the Frenchay Activities Index (FAI) for assessing comprehensive activities of daily living (ADL) function in stroke patients, we developed a visual basic application (VBA)-Excel CAT module, and (1) investigated whether the averaged test length via CAT is shorter than that of the traditional all-item-answered non-adaptive testing (NAT) approach through simulation, (2) illustrated the CAT multimedia on a tablet PC showing data collection and response errors of ADL clinical functional measures in stroke patients, and (3) demonstrated the quality control of endorsing scale with fit statistics to detect responding errors, which will be further immediately reconfirmed by technicians once patient ends the CAT assessment. RESULTS: The results show that endorsed items could be shorter on CAT (M = 13.42) than on NAT (M = 23) at 41.64% efficiency in test length. However, averaged ability estimations reveal insignificant differences between CAT and NAT. CONCLUSION: This study found that mobile nursing services, placed at the bedsides of patients could, through the programmed VBA-Excel CAT module, reduce the burden to patients and save time, more so than the traditional NAT paper-and-pencil testing appraisals. %B Health and Quality of Life Outcomes %7 2009/05/07 %V 7 %P 39 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 19416521 %2 2688502 %0 Journal Article %J Psychometrika %D 2009 %T When cognitive diagnosis meets computerized adaptive testing: CD-CAT %A Cheng, Y %B Psychometrika %V 74 %P 619-632 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2008 %T Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory %A Coster, W. J. %A Haley, S. M. %A Ni, P. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %K *Disability Evaluation %K *Social Adjustment %K Activities of Daily Living %K Adolescent %K Age Factors %K Child %K Child, Preschool %K Computer Simulation %K Cross-Over Studies %K Disabled Children/*rehabilitation %K Female %K Follow-Up Studies %K Humans %K Infant %K Male %K Outcome Assessment (Health Care) %K Reference Values %K Reproducibility of Results %K Retrospective Studies %K Risk Factors %K Self Care/*standards/trends %K Sex Factors %K Sickness Impact Profile %X OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2008/04/01 %V 89 %P 622-629 %8 Apr %@ 1532-821X (Electronic)0003-9993 (Linking) %G eng %M 18373991 %2 2666276 %0 Journal Article %J Behavioral Research Methods %D 2008 %T Combining computer adaptive testing technology with cognitively diagnostic assessment %A McGlohen, M. %A Chang, Hua-Hua %K *Cognition %K *Computers %K *Models, Statistical %K *User-Computer Interface %K Diagnosis, Computer-Assisted/*instrumentation %K Humans %X A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs. %B Behavioral Research Methods %7 2008/08/14 %V 40 %P 808-21 %8 Aug %@ 1554-351X (Print) %G eng %M 18697677 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2008 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes %A Haley, S. M. %A Gandek, B. %A Siebens, H. %A Black-Schaffer, R. M. %A Sinclair, S. J. %A Tao, W. %A Coster, W. J. %A Ni, P. %A Jette, A. M. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. %B Archives of Physical Medicine and Rehabilitation %7 2008/01/30 %V 89 %P 275-283 %8 Feb %@ 1532-821X (Electronic)0003-9993 (Linking) %G eng %M 18226651 %2 2666330 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2008 %T Controlling item exposure and test overlap on the fly in computerized adaptive testing %A Chen, S-Y. %A Lei, P. W. %A Liao, W. H. %K *Decision Making, Computer-Assisted %K *Models, Psychological %K Humans %X This paper proposes an on-line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on-line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on-line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on-line version can control item exposure rate and test overlap rate without time-consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on-line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on-line alternatives, this proposed on-line method provided the best all-around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs. %B British Journal of Mathematical and Statistical Psychology %7 2007/07/26 %V 61 %P 471-92 %8 Nov %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 17650362 %0 Journal Article %J Psychological Testing %D 2008 %T Investigating item exposure control on the fly in computerized adaptive testing %A Wu, M.-L. %A Chen, S-Y. %B Psychological Testing %V 55 %P 1-32 %G eng %0 Journal Article %J Psychological Testing %D 2008 %T Item exposure control in a-stratified computerized adaptive testing %A Jhu, Y.-J., %A Chen, S-Y. %B Psychological Testing %V 55 %P 793-811 %G eng %0 Journal Article %J Spine %D 2008 %T Letting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire %A Cook, K. F. %A Choi, S. W. %A Crane, P. K. %A Deyo, R. A. %A Johnson, K. L. %A Amtmann, D. %K *Disability Evaluation %K *Health Status Indicators %K Adult %K Aged %K Aged, 80 and over %K Back Pain/*diagnosis/psychology %K Calibration %K Computer Simulation %K Diagnosis, Computer-Assisted/*standards %K Humans %K Middle Aged %K Models, Psychological %K Predictive Value of Tests %K Questionnaires/*standards %K Reproducibility of Results %X STUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages. %B Spine %7 2008/05/23 %V 33 %P 1378-83 %8 May 20 %@ 1528-1159 (Electronic) %G eng %M 18496352 %0 Journal Article %J American Journal of Pharmaceutical Education %D 2008 %T The NAPLEX: evolution, purpose, scope, and educational implications %A Newton, D. W. %A Boyle, M. %A Catizone, C. A. %K *Educational Measurement %K Education, Pharmacy/*standards %K History, 20th Century %K History, 21st Century %K Humans %K Licensure, Pharmacy/history/*legislation & jurisprudence %K North America %K Pharmacists/*legislation & jurisprudence %K Software %X Since 2004, passing the North American Pharmacist Licensure Examination (NAPLEX) has been a requirement for earning initial pharmacy licensure in all 50 United States. The creation and evolution from 1952-2005 of the particular pharmacy competency testing areas and quantities of questions are described for the former paper-and-pencil National Association of Boards of Pharmacy Licensure Examination (NABPLEX) and the current candidate-specific computer adaptive NAPLEX pharmacy licensure examinations. A 40% increase in the weighting of NAPLEX Blueprint Area 2 in May 2005, compared to that in the preceding 1997-2005 Blueprint, has implications for candidates' NAPLEX performance and associated curricular content and instruction. New pharmacy graduates' scores on the NAPLEX are neither intended nor validated to serve as a criterion for assessing or judging the quality or effectiveness of pharmacy curricula and instruction. The newest cycle of NAPLEX Blueprint revision, a continual process to ensure representation of nationwide contemporary practice, began in early 2008. It may take up to 2 years, including surveying several thousand national pharmacists, to complete. %B American Journal of Pharmaceutical Education %7 2008/05/17 %V 72 %P 33 %8 Apr 15 %@ 1553-6467 (Electronic)0002-9459 (Linking) %G eng %M 18483600 %2 2384208 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2008 %T Predicting item exposure parameters in computerized adaptive testing %A Chen, S-Y. %A Doong, S. H. %K *Algorithms %K *Artificial Intelligence %K Aptitude Tests/*statistics & numerical data %K Diagnosis, Computer-Assisted/*statistics & numerical data %K Humans %K Models, Statistical %K Psychometrics/statistics & numerical data %K Reproducibility of Results %K Software %X The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters. %B British Journal of Mathematical and Statistical Psychology %7 2008/05/17 %V 61 %P 75-91 %8 May %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 18482476 %0 Journal Article %J Applied Psychological Measurement %D 2008 %T Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study %A Qing Yi, %A Jinming Zhang, %A Chang, Hua-Hua %X

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive testing (CAT) for two realistic item selection methods, maximum item information and a-stratified with content blocking, using the randomized method as a baseline for comparison. Damage caused by organized item theft was evaluated by the number of compromised items each examinee could encounter and the impact of the compromised items on examinees' ability estimates. Severity of test security violation was assessed under self-organized and organized item theft simulation scenarios. Results indicated that though item theft could cause severe damage to CAT with either item selection method, the maximum item information method was more vulnerable to the organized item theft simulation than was the a-stratified method.

%B Applied Psychological Measurement %V 32 %P 543-558 %U http://apm.sagepub.com/content/32/7/543.abstract %R 10.1177/0146621607311336 %0 Journal Article %J Psychometrica %D 2008 %T To Weight Or Not To Weight? Balancing Influence Of Initial Items In Adaptive Testing %A Chang, H.-H. %A Ying, Z. %X

It has been widely reported that in computerized adaptive testing some examinees may get much lower scores than they would normally if an alternative paper-and-pencil version were given. The main purpose of this investigation is to quantitatively reveal the cause for the underestimation phenomenon. The logistic models, including the 1PL, 2PL, and 3PL models, are used to demonstrate our assertions. Our analytical derivation shows that, under the maximum information item selection strategy, if an examinee failed a few items at the beginning of the test, easy but more discriminating items are likely to be administered. Such items are ineffective to move the estimate close to the true theta, unless the test is sufficiently long or a variable-length test is used. Our results also indicate that a certain weighting mechanism is necessary to make the algorithm rely less on the items administered at the beginning of the test.

%B Psychometrica %V 73 %P 441-450 %N 3 %R 10.1007/S11336-007-9047-7 %0 Book Section %D 2007 %T Adaptive testing with the multi-unidimensional pairwise preference model %A Stark, S. %A Chernyshenko, O. S. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Elementary Education %D 2007 %T The comparison of maximum likelihood estimation and expected a posteriori in CAT using the graded response model %A Chen, S-K. %B Journal of Elementary Education %V 19 %P 339-371 %G eng %0 Book Section %D 2007 %T Computerized attribute-adaptive testing: A new computerized adaptive testing approach incorporating cognitive psychology %A Zhou, J. %A Gierl, M. J. %A Cui, Y. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Educational Measurement %D 2007 %T Detecting Differential Speededness in Multistage Testing %A van der Linden, Wim J. %A Breithaupt, Krista %A Chuah, Siang Chee %A Zhang, Yanwei %X

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.

%B Journal of Educational Measurement %V 44 %P 117–130 %U http://dx.doi.org/10.1111/j.1745-3984.2007.00030.x %R 10.1111/j.1745-3984.2007.00030.x %0 Journal Article %J Quality of Life Research %D 2007 %T Developing tailored instruments: item banking and computerized adaptive assessment %A Bjorner, J. B. %A Chang, C-H. %A Thissen, D. %A Reeve, B. B. %K *Health Status %K *Health Status Indicators %K *Mental Health %K *Outcome Assessment (Health Care) %K *Quality of Life %K *Questionnaires %K *Software %K Algorithms %K Factor Analysis, Statistical %K Humans %K Models, Statistical %K Psychometrics %X Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. %B Quality of Life Research %7 2007/05/29 %V 16 %P 95-108 %@ 0962-9343 (Print) %G eng %M 17530450 %0 Journal Article %J Educational Assessment %D 2007 %T The effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items? %A Ferdous, A. A. %A Plake, B. S. %A Chang, S-R. %K ability %K operational computerized adaptive test %K pretest items %K time %X The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Educational Assessment %I Lawrence Erlbaum: US %V 12 %P 161-173 %@ 1062-7197 (Print); 1532-6977 (Electronic) %G eng %M 2007-06685-003 %0 Journal Article %J Quality of Life Research %D 2007 %T The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment %A Cella, D. %A Gershon, R. C. %A Lai, J-S. %A Choi, S. W. %X The use of item banks and computerized adaptive testing (CAT) begins with clear definitions of important outcomes, and references those definitions to specific questions gathered into large and well-studied pools, or “banks” of items. Items can be selected from the bank to form customized short scales, or can be administered in a sequence and length determined by a computer programmed for precision and clinical relevance. Although far from perfect, such item banks can form a common definition and understanding of human symptoms and functional problems such as fatigue, pain, depression, mobility, social function, sensory function, and many other health concepts that we can only measure by asking people directly. The support of the National Institutes of Health (NIH), as witnessed by its cooperative agreement with measurement experts through the NIH Roadmap Initiative known as PROMIS (www.nihpromis.org), is a big step in that direction. Our approach to item banking and CAT is practical; as focused on application as it is on science or theory. From a practical perspective, we frequently must decide whether to re-write and retest an item, add more items to fill gaps (often at the ceiling of the measure), re-test a bank after some modifications, or split up a bank into units that are more unidimensional, yet less clinically relevant or complete. These decisions are not easy, and yet they are rarely unforgiving. We encourage people to build practical tools that are capable of producing multiple short form measures and CAT administrations from common banks, and to further our understanding of these banks with various clinical populations and ages, so that with time the scores that emerge from these many activities begin to have not only a common metric and range, but a shared meaning and understanding across users. In this paper, we provide an overview of item banking and CAT, discuss our approach to item banking and its byproducts, describe testing options, discuss an example of CAT for fatigue, and discuss models for long term sustainability of an entity such as PROMIS. Some barriers to success include limitations in the methods themselves, controversies and disagreements across approaches, and end-user reluctance to move away from the familiar. %B Quality of Life Research %V 16 %P 133-141 %@ 0962-9343 %G eng %0 Journal Article %J Journal of Rheumatology %D 2007 %T Improving patient reported outcomes using item response theory and computerized adaptive testing %A Chakravarty, E. F. %A Bjorner, J. B. %A Fries, J.F. %K *Rheumatic Diseases/physiopathology/psychology %K Clinical Trials %K Data Interpretation, Statistical %K Disability Evaluation %K Health Surveys %K Humans %K International Cooperation %K Outcome Assessment (Health Care)/*methods %K Patient Participation/*methods %K Research Design/*trends %K Software %X OBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials. %B Journal of Rheumatology %7 2007/06/07 %V 34 %P 1426-31 %8 Jun %@ 0315-162X (Print) %G eng %M 17552069 %0 Journal Article %J Quality of Life Research %D 2007 %T IRT health outcomes data analysis project: an overview and summary %A Cook, K. F. %A Teal, C. R. %A Bjorner, J. B. %A Cella, D. %A Chang, C-H. %A Crane, P. K. %A Gibbons, L. E. %A Hays, R. D. %A McHorney, C. A. %A Ocepek-Welikson, K. %A Raczek, A. E. %A Teresi, J. A. %A Reeve, B. B. %K *Data Interpretation, Statistical %K *Health Status %K *Quality of Life %K *Questionnaires %K *Software %K Female %K HIV Infections/psychology %K Humans %K Male %K Neoplasms/psychology %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Stress, Psychological %X BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. %B Quality of Life Research %7 2007/03/14 %V 16 %P 121-132 %@ 0962-9343 (Print) %G eng %M 17351824 %0 Journal Article %J Quality of Life Research %D 2007 %T Methodological issues for building item banks and computerized adaptive scales %A Thissen, D. %A Reeve, B. B. %A Bjorner, J. B. %A Chang, C-H. %X Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. %B Quality of Life Research %V 16 %P 109-119, %@ 0962-93431573-2649 %G eng %0 Book Section %D 2007 %T The modified maximum global discrimination index method for cognitive diagnostic computerized adaptive testing %A Cheng, Y %A Chang, Hua-Hua %C   D. J. Weiss (Ed.).  Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T Patient-reported outcomes measurement and management with innovative methodologies and technologies %A Chang, C-H. %K *Health Status %K *Outcome Assessment (Health Care) %K *Quality of Life %K *Software %K Computer Systems/*trends %K Health Insurance Portability and Accountability Act %K Humans %K Patient Satisfaction %K Questionnaires %K United States %X Successful integration of modern psychometrics and advanced informatics in patient-reported outcomes (PRO) measurement and management can potentially maximize the value of health outcomes research and optimize the delivery of quality patient care. Unlike the traditional labor-intensive paper-and-pencil data collection method, item response theory-based computerized adaptive testing methodologies coupled with novel technologies provide an integrated environment to collect, analyze and present ready-to-use PRO data for informed and shared decision-making. This article describes the needs, challenges and solutions for accurate, efficient and cost-effective PRO data acquisition and dissemination means in order to provide critical and timely PRO information necessary to actively support and enhance routine patient care in busy clinical settings. %B Quality of Life Research %7 2007/05/29 %V 16 Suppl 1 %P 157-66 %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 17530448 %0 Journal Article %J Medical Care %D 2007 %T The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years %A Cella, D. %A Yount, S. %A Rothrock, N. %A Gershon, R. C. %A Cook, K. F. %A Reeve, B. %A Ader, D. %A Fries, J.F. %A Bruce, B. %A Rose, M. %X The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. %B Medical Care %V 45 %P S3-S11 %G eng %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Journal Article %J Journal of Applied Measurement %D 2007 %T Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale %A Riley, B. B. %A Conrad, K. J. %A Bezruczko, N. %A Dennis, M. L. %K My article %X Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. %B Journal of Applied Measurement %V 8 %P 48-65 %G eng %0 Journal Article %J Journal of Pain Symptom Management %D 2007 %T A system for interactive assessment and management in palliative care %A Chang, C-H. %A Boni-Saenz, A. A. %A Durazo-Arvizu, R. A. %A DesHarnais, S. %A Lau, D. T. %A Emanuel, L. L. %K *Needs Assessment %K Humans %K Medical Informatics/*organization & administration %K Palliative Care/*organization & administration %X The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. %B Journal of Pain Symptom Management %7 2007/03/16 %V 33 %P 745-55 %@ 0885-3924 (Print) %G eng %M 17360148 %0 Journal Article %J Applied Psychological. Measurement %D 2007 %T Two-phase item selection procedure for flexible content balancing in CAT %A Cheng, Y %A Chang, Hua-Hua %A Yi, Q. %B Applied Psychological. Measurement %V 3 %P 467–482 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Two-Phase Item Selection Procedure for Flexible Content Balancing in CAT %A Ying Cheng, %A Chang, Hua-Hua %A Qing Yi, %X

Content balancing is an important issue in the design and implementation of computerized adaptive testing (CAT). Content-balancing techniques that have been applied in fixed content balancing, where the number of items from each content area is fixed, include constrained CAT (CCAT), the modified multinomial model (MMM), modified constrained CAT (MCCAT), and others. In this article, four methods are proposed to address the flexible content-balancing issue with the a-stratification design, named STR_C. The four methods are MMM+, an extension of MMM; MCCAT+, an extension of MCCAT; the TPM method, a two-phase content-balancing method using MMM in both phases; and the TPF method, a two-phase content-balancing method using MMM in the first phase and MCCAT in the second. Simulation results show that all of the methods work well in content balancing, and TPF performs the best in item exposure control and item pool utilization while maintaining measurement precision.

%B Applied Psychological Measurement %V 31 %P 467-482 %U http://apm.sagepub.com/content/31/6/467.abstract %R 10.1177/0146621606292933 %0 Conference Proceedings %B American Evaluation Association %D 2007 %T The use of computerized adaptive testing to assess psychopathology using the Global Appraisal of Individual Needs %A Conrad, K. J. %A Riley, B. B. %A Dennis, M. L. %B American Evaluation Association %I American Evaluation Association %C Portland, OR USA %8 November %G eng %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Assessing CAT Test Security Severity %A Yi, Q., Zhang, J. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 30(1) %P 62–63 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2006 %T Comparing methods of assessing differential item functioning in a computerized adaptive testing environment %A Lei, P-W. %A Chen, S-Y. %A Yu, L. %K computerized adaptive testing %K educational testing %K item response theory likelihood ratio test %K logistic regression %K trait estimation %K unidirectional & non-unidirectional differential item functioning %X Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 43 %P 245-264 %@ 0022-0655 (Print) %G eng %M 2006-10742-004 %0 Journal Article %J Journal of Educational Measurement %D 2006 %T Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment %A Lei, Pui-Wa %A Chen, Shu-Ying %A Yu, Lan %X

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.

%B Journal of Educational Measurement %V 43 %P 245–264 %U http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x %R 10.1111/j.1745-3984.2006.00015.x %0 Book Section %B Handbook of multimethod measurement in psychology %D 2006 %T Computer-based testing %A F Drasgow %A Chuah, S. C. %K Adaptive Testing computerized adaptive testing %K Computer Assisted Testing %K Experimentation %K Psychometrics %K Theories %X (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) %B Handbook of multimethod measurement in psychology %I American Psychological Association %C Washington D.C. USA %V xiv %P 87-100 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes %A Haley, S. M. %A Siebens, H. %A Coster, W. J. %A Tao, W. %A Black-Schaffer, R. M. %A Gandek, B. %A Sinclair, S. J. %A Ni, P. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/01 %V 87 %P 1033-42 %8 Aug %@ 0003-9993 (Print) %G eng %M 16876547 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2006 %T Constraints-weighted information method for item selection of severely constrained computerized adaptive testing %A Cheng, Y %A Chang, Hua-Hua %A Wang, X. B. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Journal Article %J Quality of Life Research %D 2006 %T Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue %A Lai, J-S. %A Crane, P. K. %A Cella, D. %K *Factor Analysis, Statistical %K *Quality of Life %K Aged %K Chicago %K Fatigue/*etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications %K Questionnaires %X BACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF. %B Quality of Life Research %V 15 %P 1179-90 %8 Sep %G eng %M 17001438 %0 Journal Article %J Applied Measurement in Education %D 2006 %T How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation %A Chuah, Siang Chee %A F Drasgow %A Luecht, Richard %B Applied Measurement in Education %V 19 %P 241-255 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_5 %R 10.1207/s15324818ame1903_5 %0 Journal Article %J Medical Care %D 2006 %T Item banks and their potential applications to health status assessment in diverse populations %A Hahn, E. A. %A Cella, D. %A Bode, R. K. %A Gershon, R. C. %A Lai, J. S. %X In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. %B Medical Care %V 44 %P S189-S197 %8 Nov %G eng %M 17060827 %0 Journal Article %J Applied Measurement in Education %D 2006 %T Multistage Testing: Widely or Narrowly Applicable? %A Stark, Stephen %A Chernyshenko, Oleksandr S. %B Applied Measurement in Education %V 19 %P 257-260 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_6 %R 10.1207/s15324818ame1903_6 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K *Computer Simulation %K *Range of Motion, Articular %K Activities of Daily Living %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Prospective Studies %K Reproducibility of Results %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, Non-P.H.S. %K Shoulder Dislocation/*physiopathology/psychology/rehabilitation %K Shoulder Pain/*physiopathology/psychology/rehabilitation %K Shoulder/*physiopathology %K Sickness Impact Profile %K Treatment Outcome %X BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. %B Journal of Clinical Epidemiology %V 59 %P 290-8 %G eng %M 16488360 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K computerized adaptive testing %K Flexilevel Scale of Shoulder Function %K Item Response Theory %K Rehabilitation %X

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

%B Journal of Clinical Epidemiology %V 59 %P 290-298 %G English %N 3 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory %A Haley, S. M. %A Raczek, A. E. %A Coster, W. J. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %K *Computer Simulation %K *Disability Evaluation %K Adolescent %K Child %K Child, Preschool %K Cross-Sectional Studies %K Disabled Children/*rehabilitation %K Female %K Humans %K Infant %K Male %K Outcome Assessment (Health Care)/*methods %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %X OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2005/05/17 %V 86 %P 932-9 %8 May %@ 0003-9993 (Print) %G eng %M 15895339 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory %A Haley, S. %A Raczek, A. %A Coster, W. %A Dumas, H. %A Fragalapinkham, M. %B Archives of Physical Medicine and Rehabilitation %V 86 %P 932-939 %@ 00039993 %G eng %0 Journal Article %J Educational Technology & Society %D 2005 %T An Authoring Environment for Adaptive Testing %A Guzmán, E %A Conejo, R %A García-Hervás, E %K Adaptability %K Adaptive Testing %K Authoring environment %K Item Response Theory %X

SIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied. 

%B Educational Technology & Society %V 8 %P 66-76 %G eng %N 3 %0 Journal Article %J Journal of Educational Measurement %D 2005 %T A closer look at using judgments of item difficulty to change answers on computerized adaptive tests %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %B Journal of Educational Measurement %V 42 %P 331-350 %0 Journal Article %J Developmental Medicine and Child Neuropsychology %D 2005 %T A computer adaptive testing approach for assessing physical functioning in children and adolescents %A Haley, S. M. %A Ni, P. %A Fragala-Pinkham, M. A. %A Skrinar, A. M. %A Corzo, D. %K *Computer Systems %K Activities of Daily Living %K Adolescent %K Age Factors %K Child %K Child Development/*physiology %K Child, Preschool %K Computer Simulation %K Confidence Intervals %K Demography %K Female %K Glycogen Storage Disease Type II/physiopathology %K Health Status Indicators %K Humans %K Infant %K Infant, Newborn %K Male %K Motor Activity/*physiology %K Outcome Assessment (Health Care)/*methods %K Reproducibility of Results %K Self Care %K Sensitivity and Specificity %X The purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning. %B Developmental Medicine and Child Neuropsychology %7 2005/02/15 %V 47 %P 113-120 %8 Feb %@ 0012-1622 (Print) %G eng %M 15707234 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2005 %T Computerized adaptive testing: a mixture item selection approach for constrained situations %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %K *Computer-Aided Design %K *Educational Measurement/methods %K *Models, Psychological %K Humans %K Psychometrics/methods %X In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (theta) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item-overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non-statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS-MI), integrating the MS and MI approaches, in testing with non-statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS-MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS-MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. %B British Journal of Mathematical and Statistical Psychology %7 2005/11/19 %V 58 %P 239-57 %8 Nov %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 16293199 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling item exposure and test overlap in computerized adaptive testing %A Chen, S.Y. %A Lei, P. W. %B Applied Psychological Measurement %V 29(2) %P 204–217 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling Item Exposure and Test Overlap in Computerized Adaptive Testing %A Chen, Shu-Ying %A Lei, Pui-Wa %X

This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent.

%B Applied Psychological Measurement %V 29 %P 204-217 %U http://apm.sagepub.com/content/29/3/204.abstract %R 10.1177/0146621604271495 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling item exposure and test overlap in computerized adaptive testing %A Chen, S-Y. %A Lei, P-W. %K Adaptive Testing %K Computer Assisted Testing %K Item Content (Test) computerized adaptive testing %X This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 204-217 %G eng %0 Journal Article %J Evaluation and the Health Professions %D 2005 %T Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer %A Eton, D. T. %A Lai, J. S. %A Cella, D. %A Reeve, B. B. %A Talcott, J. A. %A Clark, J. A. %A McPherson, C. P. %A Litwin, M. S. %A Moinpour, C. M. %K *Quality of Life %K *Questionnaires %K Adult %K Aged %K Data Collection/methods %K Humans %K Intestine, Large/*physiopathology %K Male %K Middle Aged %K Prostatic Neoplasms/*physiopathology %K Psychometrics %K Research Support, Non-U.S. Gov't %K Statistics, Nonparametric %X Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. %B Evaluation and the Health Professions %V 28 %P 142-59 %G eng %M 15851770 %0 Journal Article %J Health Services Research %D 2005 %T Dynamic assessment of health outcomes: Time to let the CAT out of the bag? %A Cook, K. F. %A O'Malley, K. J. %A Roddey, T. S. %K computer adaptive testing %K Item Response Theory %K self reported health outcomes %X Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Health Services Research %I Blackwell Publishing: United Kingdom %V 40 %P 1694-1711 %@ 0017-9124 (Print); 1475-6773 (Electronic) %G eng %M 2006-02162-008 %0 Conference Paper %B Annual meeting of the National Council on Measurement in Education %D 2005 %T The effectiveness of using multiple item pools in computerized adaptive testing %A Zhang, J. %A Chang, H. %B Annual meeting of the National Council on Measurement in Education %C Montreal, Canada %8 04/2005 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) %D 2005 %T Identifying practical indices for enhancing item pool security %A Yi, Q. %A Zhang, J. %A Chang, Hua-Hua %B Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) %C Montreal, Canada %G eng %0 Generic %D 2005 %T Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach %A van der Linden, W. J. %A Chang, Hua-Hua %C Law School Admission Council, Computerized Testing Report 01-09 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T An item bank was created to improve the measurement of cancer-related fatigue %A Lai, J-S. %A Cella, D. %A Dineen, K. %A Bode, R. %A Von Roenn, J. %A Gershon, R. C. %A Shevrin, D. %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Fatigue/*etiology/psychology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications/psychology %K Psychometrics %K Questionnaires %X OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. %B Journal of Clinical Epidemiology %7 2005/02/01 %V 58 %P 190-7 %8 Feb %@ 0895-4356 (Print)0895-4356 (Linking) %G eng %9 Multicenter Study %M 15680754 %0 Journal Article %J Journal of Pain and Symptom Management %D 2005 %T An item response theory-based pain item bank can enhance measurement precision %A Lai, J-S. %A Dineen, K. %A Reeve, B. B. %A Von Roenn, J. %A Shervin, D. %A McGuire, M. %A Bode, R. K. %A Paice, J. %A Cella, D. %K computerized adaptive testing %X Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. %B Journal of Pain and Symptom Management %V 30 %P 278-88 %G eng %M 16183012 %0 Journal Article %J American Journal of Physical Medicine and Rehabilitation %D 2005 %T Measuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach %A Siebens, H. %A Andres, P. L. %A Pengsheng, N. %A Coster, W. J. %A Haley, S. M. %K Activities of Daily Living/*classification %K Adult %K Aged %K Cohort Studies %K Continuity of Patient Care %K Disability Evaluation %K Female %K Health Services Research %K Humans %K Male %K Middle Aged %K Postoperative Care/*rehabilitation %K Prognosis %K Recovery of Function %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %K Sickness Impact Profile %K Treatment Outcome %X OBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions. %B American Journal of Physical Medicine and Rehabilitation %V 84 %P 741-8 %8 Oct %G eng %M 16205429 %0 Journal Article %J Clinical and Experimental Rheumatology %D 2005 %T The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes %A Fries, J.F. %A Bruce, B. %A Cella, D. %K computerized adaptive testing %X PROMIS (Patient-Reported-Outcomes Measurement Information System) is an NIH Roadmap network project intended to improve the reliability, validity, and precision of PROs and to provide definitive new instruments that will exceed the capabilities of classic instruments and enable improved outcome measurement for clinical research across all NIH institutes. Item response theory (IRT) measurement models now permit us to transition conventional health status assessment into an era of item banking and computerized adaptive testing (CAT). Item banking uses IRT measurement models and methods to develop item banks from large pools of items from many available questionnaires. IRT allows the reduction and improvement of items and assembles domains of items which are unidimensional and not excessively redundant. CAT provides a model-driven algorithm and software to iteratively select the most informative remaining item in a domain until a desired degree of precision is obtained. Through these approaches the number of patients required for a clinical trial may be reduced while holding statistical power constant. PROMIS tools, expected to improve precision and enable assessment at the individual patient level which should broaden the appeal of PROs, will begin to be available to the general medical community in 2008. %B Clinical and Experimental Rheumatology %V 23 %P S53-7 %G eng %M 16273785 %0 Conference Paper %B National Council on Measurement in Education %D 2005 %T Rescuing CAT by fixing the problems %A Chang, S-H. %A Zhang, J. %B National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Psychological Assessment %D 2005 %T Validation of a computerized adaptive testing version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) %A Simms, L. J., %A Clark, L. A. %X This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. %B Psychological Assessment %V 17(1) %P 28-43 %G eng %0 Journal Article %J Psychological Assessment %D 2005 %T Validation of a computerized adaptive version of the Schedule of Non-Adaptive and Adaptive Personality (SNAP) %A Simms, L. J. %A Clark, L.J. %X This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting. %B Psychological Assessment %V 17 %P 28-43 %G eng %0 Journal Article %J Medical Care %D 2004 %T Activity outcome measurement for postacute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Ludlow, L. H. %A Ni, P. %A Bond, T. L. %A Sinclair, S. J. %A Jette, A. M. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aftercare/*standards/statistics & numerical data %K Aged %K Boston %K Cognition/physiology %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Human %K Male %K Middle Aged %K Movement/physiology %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Psychometrics %K Questionnaires/standards %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Support, U.S. Gov't, P.H.S. %X BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. %B Medical Care %V 42 %P I49-161 %G eng %M 14707755 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Combining computer adaptive testing technology with cognitively diagnostic assessment %A McGlohen, MK %A Chang, Hua-Hua %A Wills, J. T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory %D 2004 %T Developing tailored instruments: Item banking and computerized adaptive assessment %A Chang, C-H. %B Paper presented at the conference “Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory %C Item Banks, and Computer-Adaptive Testing,” Bethesda MD %G eng %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, S-Y. %A Ankenmann, R. D. %K computerized adaptive testing %K item selection rules %K practical constraints %X The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 41 %P 149-174 %@ 0022-0655 (Print) %G eng %M 2005-04771-004 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T ffects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, Y.-Y. %A Ankenmann, R. D. %B Journal of Educational Measurement %V 41 %P 149-174 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Implementation and Measurement Efficiency of Multidimensional Computerized Adaptive Testing %A Wang, Wen-Chung %A Chen, Po-Hsi %X

Multidimensional adaptive testing (MAT) procedures are proposed for the measurement of several latent traits by a single examination. Bayesian latent trait estimation and adaptive item selection are derived. Simulations were conducted to compare the measurement efficiency of MAT with those of unidimensional adaptive testing and random administration. The results showed that the higher the correlation between latent traits, the more latent traits there were, and the more scoring levels there were in the items, the more efficient MAT was than the other two procedures. For tests containing multidimensional items, only MAT is applicable, whereas unidimensional adaptive testing is not. Issues in implementing MAT are discussed.

%B Applied Psychological Measurement %V 28 %P 295-316 %U http://apm.sagepub.com/content/28/5/295.abstract %R 10.1177/0146621604265938 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Item parameter recovery with adaptive tests %A Do, B.-R. %A Chuah, S. C. %A F Drasgow %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2004 %T Protecting the integrity of computer-adaptive licensure tests: Results of a legal challenge %A Cizek, G. J. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego CA %G eng %0 Journal Article %J Medical Care %D 2004 %T Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain %A Coster, W. J. %A Haley, S. M. %A Andres, P. L. %A Ludlow, L. H. %A Bond, T. L. %A Ni, P. S. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aged %K Aged, 80 and over %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Questionnaires/*standards %K Recovery of Function/physiology %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Research Support, U.S. Gov't, Non-P.H.S. %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. %B Medical Care %V 42 %P I62-172 %8 Jan %G eng %M 14707756 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2004 %T Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Kosinski, M. %A Ni, P. %K Boston %K Factor Analysis, Statistical %K Humans %K Outcome Assessment (Health Care)/*methods %K Prospective Studies %K Questionnaires/standards %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. %B Archives of Physical Medicine and Rehabilitation %7 2004/04/15 %V 85 %P 661-6 %8 Apr %@ 0003-9993 (Print) %G eng %M 15083444 %0 Journal Article %J Journal of Statistical Planning and Inference %D 2004 %T Sequential estimation in variable length computerized adaptive testing %A Chang, I. Y. %X With the advent of modern computer technology, there have been growing e3orts in recent years to computerize standardized tests, including the popular Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT) and the Test of English as a Foreign Language (TOEFL). Many of such computer-based tests are known as the computerized adaptive tests, a major feature of which is that, depending on their performance in the course of testing, di3erent examinees may be given with di3erent sets of items (questions). In doing so, items can be e>ciently utilized to yield maximum accuracy for estimation of examinees’ ability traits. We consider, in this article, one type of such tests where test lengths vary with examinees to yield approximately same predetermined accuracy for all ability traits. A comprehensive large sample theory is developed for the expected test length and the sequential point and interval estimates of the latent trait. Extensive simulations are conducted with results showing that the large sample approximations are adequate for realistic sample sizes. %B Journal of Statistical Planning and Inference %V 121 %P 249-264 %@ 03783758 %G eng %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2004 %T Siette: a web-based tool for adaptive testing %A Conejo, R %A Guzmán, E %A Millán, E %A Trella, M %A Pérez-De-La-Cruz, JL %A Ríos, A %K computerized adaptive testing %B International Journal of Artificial Intelligence in Education %V 14 %P 29-61 %G eng %0 Book Section %D 2004 %T Understanding computerized adaptive testing: From Robbins-Munro to Lord and beyond %A Chang, Hua-Hua %C D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 117-133). New York: Sage. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Alpha-stratified adaptive testing with large numbers of content constraints %A van der Linden, W. J. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 27 %P 107-120 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Assessing CAT security breaches by the item pooling index %A Chang, Hua-Hua %A Zhang, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2003 %T a-Stratified multistage CAT design with content-blocking %A Yi, Q. %A Chang, H.-H. %B British Journal of Mathematical and Statistical Psychology %V 56 %P 359–378 %0 Journal Article %J Clinical Therapeutics %D 2003 %T Can an item response theory-based pain item bank enhance measurement precision? %A Lai, J-S. %A Dineen, K. %A Cella, D. %A Von Roenn, J. %B Clinical Therapeutics %V 25 %P D34-D36 %G eng %M 14568660 %! Clin Ther %0 Journal Article %J Journal of Educational Measurement %D 2003 %T A comparative study of item exposure control methods in computerized adaptive testing %A Chang, S-W. %A Ansley, T. N. %K Adaptive Testing %K Computer Assisted Testing %K Educational %K Item Analysis (Statistical) %K Measurement %K Strategies computerized adaptive testing %X This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 40 %P 71-103 %G eng %0 Journal Article %J The Journal of Technology, Learning and Assessment %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %X Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items. %B The Journal of Technology, Learning and Assessment %V 2 %P 1-15 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %A Wen. Z. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing using the nearest-neighbors criterion %A Cheng, P. E. %A Liou, M. %B Applied Psychological Measurement %V 27 %P 204-216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing using the nearest-neighbors criterion %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Item Response Theory %K Statistical Analysis %K Statistical Estimation computerized adaptive testing %K Statistical Tests %X Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 204-216 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2003 %T Developing an initial physical function item bank from existing sources %A Bode, R. K. %A Cella, D. %A Lai, J. S. %A Heinemann, A. W. %K *Databases %K *Sickness Impact Profile %K Adaptation, Psychological %K Data Collection %K Humans %K Neoplasms/*physiopathology/psychology/therapy %K Psychometrics %K Quality of Life/*psychology %K Research Support, U.S. Gov't, P.H.S. %K United States %X The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. %B Journal of Applied Measurement %V 4 %P 124-36 %G eng %M 12748405 %0 Journal Article %J Medical Care (in press) %D 2003 %T Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) %A Cook, K. F. %A Roddey, T. S. %A Gartsman, G M %A Olson, S L %B Medical Care (in press) %G eng %0 Generic %D 2003 %T Effect of extra time on GRE® Quantitative and Verbal Scores (Research Report 03-13) %A Bridgeman, B. %A Cline, F. %A Hessinger, J. %C Princeton NJ: Educational Testing service %0 Journal Article %J Journal of Applied Measurement %D 2003 %T An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model %A Davis, L. L. %A Pastor, D. A. %A Dodd, B. G. %A Chiang, C. %A Fitzpatrick, S. J. %K *Computers %K *Educational Measurement %K *Models, Theoretical %K Automation %K Decision Making %K Humans %K Reproducibility of Results %X The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. %B Journal of Applied Measurement %V 4 %P 24-42 %G eng %M 12700429 %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach %A van der Linden, W. J. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 27 %P 107-120 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %K computerized adaptive testing %X Studied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD) %B Educational and Psychological Measurement %V 63 %P 257-70 %G eng %M EJ672406 %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation Of Content Balancing Requirements In Stratification Designs For Computerized Adaptive Testing %A Leung, Chi-Keung %A Chang, Hua-Hua %A Hau, Kit-Tai %X

In computerized adaptive testing, the multistage a-stratified design advocates a new philosophy on pool management and item selection in which, contradictory to common practice, less discriminating items are used first. The method is effective in reducing item-overlap rate and enhancing pool utilization. This stratification method has been extended in different ways to deal with the practical issues of content constraints and the positive correlation between item difficulty and discrimination. Nevertheless, these modified designs on their own do not automatically satisfy content requirements. In this study, three stratification designs were examined in conjunction with three well developed content balancing methods. The performance of each of these nine combinational methods was evaluated in terms of their item security, measurement efficiency, and pool utilization. Results showed substantial differences in item-overlap rate and pool utilization among different methods. An optimal combination of stratification design and content balancing method is recommended.

%B Educational and Psychological Measurement %V 63 %P 257-270 %U http://epm.sagepub.com/content/63/2/257.abstract %R 10.1177/0013164403251326 %0 Journal Article %J Quality of Life Research %D 2003 %T Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale %A Lai, J-S. %A Crane, P. K. %A Cella, D. %A Chang, C-H. %A Bode, R. K. %A Heinemann, A. W. %K *Health Status Indicators %K *Questionnaires %K Adult %K Fatigue/*diagnosis/etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/complications %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Sickness Impact Profile %X Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. %B Quality of Life Research %V 12 %P 485-501 %8 Aug %G eng %M 13677494 %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Optimal stratification of item pools in α-stratified computerized adaptive testing %A Chang, Hua-Hua %A van der Linden, W. J. %K Adaptive Testing %K Computer Assisted Testing %K Item Content (Test) %K Item Response Theory %K Mathematical Modeling %K Test Construction computerized adaptive testing %X A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 262-274 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Predicting item exposure parameters in computerized adaptive testing %A Chen, S-Y. %A Doong, H. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S-Y. %A Ankemann, R. D. %A Spray, J. A. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Human Computer %K Interaction computerized adaptive testing %K Item Analysis %K Item Analysis (Test) %K Test Items %X The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T A simulation study to compare CAT strategies for cognitive diagnosis %A Xu, X. %A Chang, Hua-Hua %A Douglas, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Test-score comparability, ability estimation, and item-exposure control in computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Seminars in Oncology %D 2002 %T Advances in quality of life measurements in oncology patients %A Cella, D. %A Chang, C-H. %A Lai, J. S. %A Webster, K. %K *Quality of Life %K *Sickness Impact Profile %K Cross-Cultural Comparison %K Culture %K Humans %K Language %K Neoplasms/*physiopathology %K Questionnaires %X Accurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated. %B Seminars in Oncology %V 29 %P 60-8 %8 Jun %G eng %M 12082656 %0 Journal Article %J Educational Media International %D 2002 %T Applicable adaptive testing models for school teachers %A Chang-Hwa, W. A. %A Chuang, C-L. %X The purpose of this study was to investigate the attitudinal effects on SPRT adaptive testing environment for junior high school students. Subjects were 39 eighth graders from a selected junior high school. Major instruments for the study were the Junior High School Natural Sciences Adaptive Testing System driven by the SPRT algorithm, and a self-developed attitudinal questionnaire, factors examined include: test anxiety, examinee preference, adaptability of the test, and acceptance of the test result. The major findings were that overall, junior high school students" attitudes towards computerized adaptive tests were positive, no significant correlations existed between test attitude and the test length. The results indicated that junior high school students generally have positive attitudes towards adaptive testing.Modèles de tests d"adaptation à l"usage des enseignants. L"objectif de cette étude était d"enquêter sur les effets causés par une passation de tests d"adaptation ( selon l"algorithme "Sequential Probability Radio Test " (SPRT) ) dans une classe de trente-neuf élèves de huitième année du secondaire inférieur. Les principaux instruments utilisés ont été ceux du système de tests d"adaptation (avec le SPRT) et destiné aux classes de sciences naturelles du degré secondaire inférieur. Un questionnaire d"attitude, développé par nos soins, a également été utilisé pour examiner les facteurs suivants: test d"anxiété, préférence des candidats, adaptabilité du test et acceptation des résultats. Les principales conclusions ont été que, dans l"ensemble, l"attitude des élèves du secondaire inférieur face aux tests d"adaptation informatisés a été positive, aucune corrélation significative existant entre cette attitude et la longueur des tests. Les résultats démontrent aussi que les élèves du secondaire ont une attitude généralement positive envers les tests d"adaptation.Test Modelle zur Anwendung durch Klassenlehrer Zweck dieser Untersuchung war, die Auswirkungen über die Einstellung von Jun. High School Schülern im Zusammenhang mit dem SPRT Testumfeld zu untersuchen. 39 Achtklässler einer Jun. High School nahmen an dem Test teil. Die Untersuchung stützte sich hauptsächlich auf das Jun. High School Natural. Sciences Adaptive Testing System, das auf dem SPRT Rechnungsverfahren basiert sowie einem selbst erstellten Fragebogen mit folgenden Faktoren: Testängste, Präferenzen der Testperson, Geeignetheit des Tests, Anerkennung des Testergebnisses. Es stellte sich heraus, dass die Einstellung der Studenten zu den Computer adaptierten Tests im allgemeinen positiv waren; es ergaben sich keine bedeutsamen Wechselwirkungen zwischen persönlicher Testeinstellung und Testlänge. Die Ergebnisse belegen, dass Jun. High School Schüler im allgemeinen eine positive Haltung zu adaptierten Tests haben. %B Educational Media International %V 39 %P 55-59 %G eng %M EJ654148 %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %A Hendrickson, A. B. %A Ihrig, D. %B Journal of Educational Measurement %V 39 %P 311-330 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Comparing three item selection approaches for computerized adaptive testing with content balancing requirement %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model %A Pastor, D. A. %A Dodd, B. G. %A Chang, Hua-Hua %K (Statistical) %K Adaptive Testing %K Algorithms computerized adaptive testing %K Computer Assisted Testing %K Item Analysis %K Item Response Theory %K Mathematical Modeling %X The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 26 %P 147-163 %G eng %0 Journal Article %J British Journal of Educational Technology %D 2002 %T Computerised adaptive testing %A Latu, E. %A Chapman, E. %K computerized adaptive testing %X Considers the potential of computer adaptive testing (CAT). Discusses the use of CAT instead of traditional paper and pencil tests, identifies decisions that impact the efficacy of CAT, and concludes that CAT is beneficial when used to its full potential on certain types of tests. (LRW) %B British Journal of Educational Technology %V 33 %P 619-22 %G eng %M EJ657892 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Fairness issues in adaptive tests with strict time limits %A Bridgeman, B. %A Cline, F. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Journal Article %J Quality of Life Research %D 2002 %T Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice %A Davis, K. M. %A Chang, C-H. %A Lai, J-S. %A Cella, D. %B Quality of Life Research %V 11(7) %P 134 %G eng %0 Journal Article %J Psychometrika %D 2002 %T Hypergeometric family and item overlap rates in computerized adaptive testing %A Chang, Hua-Hua %A Zhang, J. %K Adaptive Testing %K Algorithms %K Computer Assisted Testing %K Taking %K Test %K Time On Task computerized adaptive testing %X A computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA ) %B Psychometrika %V 67 %P 387-398 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Identify the lower bounds for item sharing and item pooling in computerized adaptive testing %A Chang, Hua-Hua %A Zhang, J. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Applied Psychological Measurement %V 26 %P 376-392 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %B Applied Psychological Measurement %V 26 %P 376-392 %@ 0146-6216 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2002 %T Measuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system %A Cella, D. %A Nowinski, C. J. %K *Chronic Disease %K *Quality of Life %K *Rehabilitation %K Adult %K Comparative Study %K Health Status Indicators %K Humans %K Psychometrics %K Questionnaires %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X We focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing. %B Archives of Physical Medicine and Rehabilitation %V 83 %P S10-7 %8 Dec %G eng %M 12474167 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Optimum number of strata in the a-stratified adaptive testing design %A Wen, J.-B. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Redeveloping the exposure control parameters of CAT items when a pool is modified %A Chang, S-W. %A Harris, D. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing %A Zhao, J. C. %A McMorris, R. F. %A Pruzek, R. M. %A Chen, R. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book Section %D 2002 %T Test models for complex computer-based testing %A Luecht, RM %A Clauser, B. E. %C C. N. Mille,. M. T. Potenza, J. J. Fremer, and W. C. Ward (Eds.). Computer-based testing: Building the foundation for future assessments (pp. 67-88). Hillsdale NJ: Erlbaum. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T To weight or not to weight – balancing influence of initial and later items in CAT %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Using judgments of item difficulty to change answers on computerized adaptive vocabulary tests %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Adaptation of a-stratified method in variable length computerized adaptive testing %A Wen, J.-B. %A Chang, Hua-Hua %A Hau, K.-T.  %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J Teacher Development %D 2001 %T Assessment in the twenty-first century: A role of computerised adaptive testing in national curriculum subjects %A Cowan, P. %A Morrison, H. %K computerized adaptive testing %X With the investment of large sums of money in new technologies forschools and education authorities and the subsequent training of teachers to integrate Information and Communications Technology (ICT) into their teaching strategies, it is remarkable that the old outdated models of assessment still remain. This article highlights the current problems associated with pen-and paper-testing and offers suggestions for an innovative and new approach to assessment for the twenty-first century. Based on the principle of the 'wise examiner' a computerised adaptive testing system which measures pupils' ability against the levels of the United Kingdom National Curriculum has been developed for use in mathematics. Using constructed response items, pupils are administered a test tailored to their ability with a reliability index of 0.99. Since the software administers maximally informative questions matched to each pupil's current ability estimate, no two pupils will receive the same set of items in the same order therefore removing opportunities for plagarism and teaching to the test. All marking is automated and a journal recording the outcome of the test and highlighting the areas of difficulty for each pupil is available for printing by the teacher. The current prototype of the system can be used on a school's network however the authors envisage a day when Examination Boards or the Qualifications and Assessment Authority (QCA) will administer Government tests from a central server to all United Kingdom schools or testing centres. Results will be issued at the time of testing and opportunities for resits will become more widespr %B Teacher Development %V 5 %P 241-57 %G eng %M EJ644183 %0 Conference Paper %B Paper presented at the Annual Meeting of the Psychometric Society %D 2001 %T a-stratified CAT design with content-blocking %A Yi, Q. %A Chang, Hua-Hua %B Paper presented at the Annual Meeting of the Psychometric Society %C King of Prussia, PA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T a-stratified computerized adaptive testing with unequal item exposure across strata %A Deng, H. %A Chang, Hua-Hua %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2001 %T a-Stratified multistage computerized adaptive testing with b blocking %A Chang, Hua-Hua %A Qian, J. %A Ying, Z. %X Chang & Ying’s (1999) computerized adaptive testing item-selection procedure stratifies the item bank according to a parameter values and requires b parameter values to be evenly distributed across all strata. Thus, a and b parameter values must be incorporated into how strata are formed. A refinement is proposed, based on Weiss’ (1973) stratification of items according to b values. Simulation studies using a retired item bank of a Graduate Record Examination test indicate that the new approach improved control of item exposure rates and reduced mean squared errors. %B Applied Psychological Measurement %V 25 %P 333-341 %@ 0146-6216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2001 %T a-stratified multistage computerized adaptive testing with b blocking %A Chang, Hua-Hua %A Qian, J. %A Yang, Z. %K computerized adaptive testing %X Proposed a refinement, based on the stratification of items developed by D. Weiss (1973), of the computerized adaptive testing item selection procedure of H. Chang and Z. Ying (1999). Simulation studies using an item bank from the Graduate Record Examination show the benefits of the new procedure. (SLD) %B Applied Psychological Measurement %V 25 %P 333-41 %G eng %M EJ644200 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. Hendrickson, A. B. %A Ihrig, D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Deriving a stopping rule for sequential adaptive tests %A Grabovsky, I. %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J American Journal of Preventative Medicine %D 2001 %T Development of an adaptive multimedia program to collect patient health data %A Sutherland, L. A. %A Campbell, M. %A Ornstein, K. %A Wildemuth, B. %A Lobach, D. %B American Journal of Preventative Medicine %V 21 %P 320-324 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT %A Chang, S-W. %A Twu, B.-Y. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2001 %T An examination of item selection rules by stratified CAT designs integrated with content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J American School Board Journal %D 2001 %T Final answer? %A Coyle, J. %K computerized adaptive testing %X The Northwest Evaluation Association helped an Indiana school district develop a computerized adaptive testing system that was aligned with its curriculum and geared toward measuring individual student growth. Now the district can obtain such information from semester to semester and year to year, get immediate results, and test students on demand. (MLH) %B American School Board Journal %V 188 %P 24-26 %G eng %M EJ623034 %0 Generic %D 2001 %T Implementing content constraints in a-stratified adaptive testing using a shadow test approach (Research Report 01-001) %A Chang, Hua-Hua %A van der Linden, W. J. %C University of Twente, Department of Educational Measurement and Data Analysis %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Integrating stratification and information approaches for multiple constrained CAT %A Leung, C.-I. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Journal of Educational Measurement %D 2001 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? %A Hau, Kit-Tai %A Chang, Hua-Hua %K ability %K Adaptive Testing %K Computer Assisted Testing %K Estimation %K Statistical %K Test Items computerized adaptive testing %X During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 38 %P 249-266 %G eng %0 Journal Article %J Psychometrika %D 2001 %T On maximizing item information and matching difficulty with ability %A Bickel, P. %A Buyske, S. %A Chang, Hua-Hua %A Ying, Z. %B Psychometrika %V 66 %P 69-77 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Psychological Association %D 2001 %T Measurement efficiency of multidimensional computerized adaptive testing %A Wang, W-C. %A Chen, B.-H. %B Paper presented at the annual meeting of the American Psychological Association %C San Francisco CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T A new approach to simulation studies in computerized adaptive testing %A Chen, S-Y. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Paper presented at National Council on Measurement in Education Annual Meeting %D 2001 %T On-line Calibration Using PARSCALE Item Specific Prior Method: Changing Test Population and Sample Size %A Guo, F. %A Stone, E. %A Cruz, D. %B Paper presented at National Council on Measurement in Education Annual Meeting %C Seattle, Washington %G eng %0 Book Section %B Setting performance standards: Concepts, methods, and perspectives %D 2001 %T Practical issues in setting standards on computerized adaptive tests %A Sireci, S. G. %A Clauser, B. E. %K Adaptive Testing %K Computer Assisted Testing %K Performance Tests %K Testing Methods %X (From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA ) %B Setting performance standards: Concepts, methods, and perspectives %I Lawrence Erlbaum Associates, Inc. %C Mahwah, N.J. USA %P 355-369 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T A comparison of item selection rules at the early stages of computerized adaptive testing %A Chen, S-Y. %A Ankenmann, R. D. %A Chang, Hua-Hua %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis (Test) %K Statistical Estimation computerized adaptive testing %X The effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 24 %P 241-255 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T A comparison of item selection rules at the early stages of computerized adaptive testing %A Chen, S.Y. %A Ankenmann, R. D. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 24 %P 241-255 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T Content balancing in stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Journal Article %J Psychometrika %D 2000 %T Does adaptive testing violate local independence? %A Mislevy, R. J. %A Chang, Hua-Hua %B Psychometrika %V 65 %P 149-156 %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) %A Chang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA, ACT, Inc %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T Estimation of trait level in computerized adaptive testing %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Statistical Estimation computerized adaptive testing %X Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 24 %P 257-265 %G eng %0 Journal Article %J Chronicle of Higher Education %D 2000 %T ETS finds flaws in the way online GRE rates some students %A Carlson, S. %B Chronicle of Higher Education %V 47 %P a47 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model %A Davis, L. L. %A Pastor, D. A. %A Dodd, B. G. %A Chiang, C. %A Fitzpatrick, S. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Generic %D 2000 %T Multiple stratification CAT designs with content control %A Yi, Q. %A Chang, Hua-Hua %C Unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2000 %T Performance of item exposure control methods in computerized adaptive testing: Further explorations %A Chang, Hua-Hua %A Chang, S. %A Ansley %B Paper presented at the Annual Meeting of the American Educational Research Association %C New Orleans , LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Solving complex constraints in a-stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, USA %G eng %0 Book Section %D 2000 %T Using Bayesian Networks in Computerized Adaptive Tests %A Millan, E. %A Trella, M %A Perez-de-la-Cruz, J.-L. %A Conejo, R %C M. Ortega and J. Bravo (Eds.),Computers and Education in the 21st Century. Kluwer, pp. 217228. %G eng %0 Conference Paper %B Paper presented at the Computer-Assisted Testing Conference. %D 2000 %T Using constraints to develop and deliver adaptive tests %A Abdullah, S. C %A Cooley, R. E. %B Paper presented at the Computer-Assisted Testing Conference. %G eng %0 Generic %D 2000 %T Variations in mean response times for questions on the computer-adaptive GRE general test: Implications for fair assessment (GRE Board Professional Report No %A Bridgeman, B. %A Cline, F. %C 96-20P: Educational Testing Service Research Report 00-7) %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T a-stratified multistage computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %K computerized adaptive testing %X For computerized adaptive tests (CAT) based on the three-parameter logistic mode it was found that administering items with low discrimination parameter (a) values early in the test and administering those with high a values later was advantageous; the skewness of item exposure distributions was reduced while efficiency was maintain in trait level estimation. Thus, a new multistage adaptive testing approach is proposed that factors a into the item selection process. In this approach, the items in the item bank are stratified into a number of levels based on their a values. The early stages of a test use items with lower as and later stages use items with higher as. At each stage, items are selected according to an optimization criterion from the corresponding level. Simulation studies were performed to compare a-stratified CATs with CATs based on the Sympson-Hetter method for controlling item exposure. Results indicated that this new strategy led to tests that were well-balanced, with respect to item exposure, and efficient. The a-stratified CATs achieved a lower average exposure rate than CATs based on Bayesian or information-based item selection and the Sympson-Hetter method. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Applied Psychological Measurement %V 23 %P 211-222 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T a-stratified multistage computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %B Applied Psychological Measurement %V 23 %P 211–222 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T a-stratified multistage computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %B Applied Psychological Measurement %V 23 %P 211-222 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T An enhanced stratified computerized adaptive testing design %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal, Canada %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %C Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (ACT Research Report series 99-5) %A Chen, S-Y. %A Ankenmann, R. D. %A Spray, J. A. %C Iowa City IA: ACT, Inc %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Fairness in computer-based testing %A Gallagher, Aand %A Bridgeman, B. %A Calahan, C %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 1999 %T Item selection in computerized adaptive testing: improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Montreal, CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of American Educational Research Association %D 1999 %T Performance of the Sympson-Hetter exposure control algorithm with a polytomous item bank %A Pastor, D. A. %A Chiang, C. %A Dodd, B. G. %A Yockey, R. and %B Paper presented at the annual meeting of American Educational Research Association %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T The use of linear-on-the-fly testing for TOEFL Reading %A Carey, P. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Generic %D 1999 %T WISCAT: Een computergestuurd toetspakket voor rekenen en wiskunde [A computerized test package for arithmetic and mathematics] %A Cito. %C Cito: Arnhem, The Netherlands %G eng %0 Book %D 1998 %T Applications of network flows to computerized adaptive testing %A Cordova, M. J. %C Dissertation, Rutgers Center for Operations Research (RUTCOR), Rutgers University, New Brunswick NJ %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: the Sciences & Engineering %D 1998 %T Applications of network flows to computerized adaptive testing %A Claudio, M. J. C. %K computerized adaptive testing %X Recently, the concept of Computerized Adaptive Testing (CAT) has been receiving ever growing attention from the academic community. This is so because of both practical and theoretical considerations. Its practical importance lies in the advantages of CAT over the traditional (perhaps outdated) paper-and-pencil test in terms of time, accuracy, and money. The theoretical interest is sparked by its natural relationship to Item Response Theory (IRT). This dissertation offers a mathematical programming approach which creates a model that generates a CAT that takes care of many questions concerning the test, such as feasibility, accuracy and time of testing, as well as item pool security. The CAT generated is designed to obtain the most information about a single test taker. Several methods for eatimating the examinee's ability, based on the (dichotomous) responses to the items in the test, are also offered here. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International: Section B: the Sciences & Engineering %V 59 %P 0855 %G eng %0 Journal Article %J Journal of the American Statistical Association %D 1998 %T Bayesian identification of outliers in computerized adaptive testing %A Bradlow, E. T. %A Weiss, R. E. %A Cho, M. %X We consider the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT). The vec-tor of responses yi of person i from the CAT comprise a multivariate response vector. Multivariate observations may be outlying in manydi erent directions and we characterize speci c directions as corre- sponding to outliers with different interpretations. We develop a class of outlier statistics to identify different types of outliers based on a con-trol chart type methodology. The outlier methodology is adaptable to general longitudinal discrete data structures. We consider several procedures to judge how extreme a particular outlier is. Data from the National Council Licensure EXamination (NCLEX) motivates our development and is used to illustrate the results. %B Journal of the American Statistical Association %V 93 %P 910-919 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T CAT item calibration %A Hsu, Y. %A Thompson, T.D. %A Chen, W-H. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Generic %D 1998 %T A comparative study of item exposure control methods in computerized adaptive testing %A Chang, S-W. %A Twu, B.-Y. %C Research Report Series 98-3, Iowa City: American College Testing. %G eng %0 Book %D 1998 %T A comparative study of item exposure control methods in computerized adaptive testing %A Chang, S-W. %C Unpublished doctoral dissertation, University of Iowa , Iowa City IA %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1998 %T A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model %A Chen, S. %A Hou, L. %A Dodd, B. G. %B Educational and Psychological Measurement %V 58 %P 569-595 %G eng %0 Conference Paper %B Paper presented at the meeting of the American Educational Research Association. San Diego CA. %D 1998 %T A comparison of two methods of controlling item exposure in computerized adaptive testing %A Tang, L. %A Jiang, H. %A Chang, Hua-Hua %B Paper presented at the meeting of the American Educational Research Association. San Diego CA. %G eng %0 Generic %D 1998 %T Does adaptive testing violate local independence? (Research Report 98-33) %A Mislevy, R. J. %A Chang, Hua-Hua %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? %A Hau, K. T. %A Chang, Hua-Hua %B Journal of Educational Measurement %V 38 %P 249-266 %G eng %0 Conference Paper %D 1998 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA %A Hau, K. T. %A Chang, Hua-Hua %G eng %0 Journal Article %J Advances in Health Sciences Education %D 1998 %T Maintaining content validity in computerized adaptive testing %A Luecht, RM %A de Champlain, A. %A Nungester, R. J. %K computerized adaptive testing %X The authors empirically demonstrate some of the trade-offs which can occur when content balancing is imposed in computerized adaptive testing (CAT) forms or conversely, when it is ignored. The authors contend that the content validity of a CAT form can actually change across a score scale when content balancing is ignored. However they caution that, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from 2 simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Advances in Health Sciences Education %V 3 %P 29-41 %G eng %0 Journal Article %J International Journal of Selection and Assessment %D 1998 %T Swedish Enlistment Battery: Construct validity and latent variable estimation of cognitive abilities by the CAT-SEB %A Mardberg, B. %A Carlstedt, B. %B International Journal of Selection and Assessment %V 6 %P 107-114 %G eng %0 Generic %D 1997 %T CAST 5 for Windows users' guide %A J. R. McBride %A Cooper, R. R %C Contract No. "MDA903-93-D-0032, DO 0054. Alexandria, VA: Human Resources Research Organization %G eng %0 Book Section %D 1997 %T CAT-ASVAB cost and benefit analyses %A Wise, L. L. %A Curran, L. T. %A J. R. McBride %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computer adaptive testing: From inquiry to operation (pp. 227-236). Washington, DC: American Psychological Association. %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: the Sciences & Engineering %D 1997 %T A comparison of maximum likelihood estimation and expected a posteriori estimation in computerized adaptive testing using the generalized partial credit model %A Chen, S-K. %K computerized adaptive testing %X A simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the generalized partial credit model (Muraki, 1992), and to compare the performance of EAP with maximum likelihood trait estimation (MLE). The performance of EAP was evaluated under different conditions: the number of quadrature points (10, 20, and 30), and the type of prior distribution (normal, uniform, negatively skewed, and positively skewed). The relative performance of the MLE and EAP estimation methods were assessed under two distributional forms of the latent trait, one normal and the other negatively skewed. Also, both the known item parameters and estimated item parameters were employed in the simulation study. Descriptive statistics, correlations, scattergrams, accuracy indices, and audit trails were used to compare the different methods of trait estimation in CAT. The results showed that, regardless of the latent trait distribution, MLE and EAP with a normal prior, a uniform prior, or the prior that matches the latent trait distribution using either 20 or 30 quadrature points provided relatively accurate estimation in CAT based on the generalized partial credit model. However, EAP using only 10 quadrature points did not work well in the generalized partial credit CAT. Also, the study found that increasing the number of quadrature points from 20 to 30 did not increase the accuracy of EAP estimation. Therefore, it appears 20 or more quadrature points are sufficient for accurate EAP estimation. The results also showed that EAP with a negatively skewed prior and positively skewed prior performed poorly for the normal data set, and EAP with positively skewed prior did not provide accurate estimates for the negatively skewed data set. Furthermore, trait estimation in CAT using estimated item parameters produced results similar to those obtained using known item parameters. In general, when at least 20 quadrature points are used, EAP estimation with a normal prior, a uniform prior or the prior that matches the latent trait distribution appears to be a good alternative to MLE in the application of polytomous CAT based on the generalized partial credit model. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International: Section B: the Sciences & Engineering %V 58 %P 453 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Computer assembly of tests so that content reigns supreme %A Case, S. M. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Educational & Psychological Measurement %D 1997 %T The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S-K. %A Hou, L. Y. %A Fitzpatrick, S. J. %A Dodd, B. G. %K computerized adaptive testing %X Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Educational & Psychological Measurement %V 57 %P 422-439 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. %B Educational and Psychological Measurement %V 57 %P 422-439 %G eng %0 Journal Article %J Quality of Life Research %D 1997 %T Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing %A Revicki, D. A. %A Cella, D. F. %K *Health Status %K *HIV Infections/diagnosis %K *Quality of Life %K Diagnosis, Computer-Assisted %K Disease Progression %K Humans %K Psychometrics/*methods %X Health status assessment is frequently used to evaluate the combined impact of human immunodeficiency virus (HIV) disease and its treatment on functioning and well-being from the patient's perspective. No single health status measure can efficiently cover the range of problems in functioning and well-being experienced across HIV disease stages. Item response theory (IRT), item banking and computer adaptive testing (CAT) provide a solution to measuring health-related quality of life (HRQoL) across different stages of HIV disease. IRT allows us to examine the response characteristics of individual items and the relationship between responses to individual items and the responses to each other item in a domain. With information on the response characteristics of a large number of items covering a HRQoL domain (e.g. physical function, and psychological well-being), and information on the interrelationships between all pairs of these items and the total scale, we can construct more efficient scales. Item banks consist of large sets of questions representing various levels of a HRQoL domain that can be used to develop brief, efficient scales for measuring the domain. CAT is the application of IRT and item banks to the tailored assessment of HRQoL domains specific to individual patients. Given the results of IRT analyses and computer-assisted test administration, more efficient and brief scales can be used to measure multiple domains of HRQoL for clinical trials and longitudinal observational studies. %B Quality of Life Research %7 1997/08/01 %V 6 %P 595-600 %8 Aug %@ 0962-9343 (Print) %G eng %M 9330558 %0 Generic %D 1997 %T Modification of the Computerized Adaptive Screening Test (CAST) for use by recruiters in all military services %A J. R. McBride %A Cooper, R. R %C Final Technical Report FR-WATSD-97-24, Contract No. MDA903-93-D-0032, DO 0054. Alexandria VA: Human Resources Research Organization. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. %D 1997 %T Multi-stage CAT with stratified design %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the annual meeting of the Psychometric Society. Gatlinburg TN. %G eng %0 Journal Article %J The Annals of Statistics. %D 1997 %T Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests %A Chang, Hua-Hua %A Ying, Z. %B The Annals of Statistics. %G eng %0 Book %D 1997 %T Optimization methods in computerized adaptive testing %A Cordova, M. J. %C Unpublished doctoral dissertation, Rutgers University, New Brunswick NJ %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration %A Swanson, D. B. %A Featherman, C. M. %A Case, A. M. %A Luecht, RM %A Nungester, R. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T A simulation study of the use of the Mantel-Haenszel and logistic regression procedures for assessing DIF in a CAT environment %A Ross, L. P. %A Nandakumar, R, %A Clauser, B. E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book Section %D 1996 %T Adaptive assessment using granularity hierarchies and Bayesian nets %A Collins, J. A. %A Greer, J. E. %A Huang, S. X. %C Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montréal, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 569-577. %G eng %0 Book %D 1996 %T Adaptive testing with granularity %A Collins, J. A. %C Masters thesis, University of Saskatchewan, Department of Computer Science %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1996 %T Building a statistical foundation for computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %B Paper presented at the annual meeting of the Psychometric Society %C Banff, Alberta, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model %A Hou, L. %A Chen, S. %A Dodd. B. G. %A Fitzpatrick, S. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %G eng %0 Journal Article %J Applied Psychological Measurement %D 1996 %T A Global Information Approach to Computerized Adaptive Testing %A Chang, H.-H. %A Ying, Z. %B Applied Psychological Measurement %V 20 %P 213-229 %G English %N 3 %0 Journal Article %J Applied Psychological Measurement %D 1996 %T A global information approach to computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %X based on Fisher information (or item information). At each stage, an item is selected to maximize the Fisher information at the currently estimated trait level (&thetas;). However, this application of Fisher information could be much less efficient than assumed if the estimators are not close to the true &thetas;, especially at early stages of an adaptive test when the test length (number of items) is too short to provide an accurate estimate for true &thetas;. It is argued here that selection procedures based on global information should be used, at least at early stages of a test when &thetas; estimates are not likely to be close to the true &thetas;. For this purpose, an item selection procedure based on average global information is proposed. Results from pilot simulation studies comparing the usual maximum item information item selection with the proposed global information approach are reported, indicating that the new method leads to improvement in terms of bias and mean squared error reduction under many circumstances. Index terms: computerized adaptive testing, Fisher information, global information, information surface, item information, item response theory, Kullback-Leibler information, local information, test information. %B Applied Psychological Measurement %V 20 %P 213-229 %@ 0146-6216 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the NMCE %D 1996 %T A model for score maximization within a computerized adaptive testing environment %A Chang, Hua-Hua %B Paper presented at the annual meeting of the NMCE %C New York NY %G eng %0 Generic %D 1996 %T Recursive maximum likelihood estimation, sequential design, and computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1995 %T The effect of population distribution and methods of theta estimation on CAT using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Conference Paper %B Paper presented at the meeting of the National Council on Measurement in Education %D 1995 %T Equating the CAT-ASVAB: Issues and approach %A Segall, D. O. %A Carter, G. %B Paper presented at the meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Generic %D 1995 %T An evaluation of alternative concepts for administering the Armed Services Vocational Aptitude Battery to applicants for enlistment %A Hogan, P.F. %A J. R. McBride %A Curran, L. T. %C DMDC Technical Report 95-013. Monterey, CA: Personnel Testing Division, Defense Manpower Data Center %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T A global information approach to computerized adaptive testing %A Chang, Hua-Hua %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco CA %G eng %0 Book %D 1995 %T Item equivalence from paper-and-pencil to computer adaptive testing %A Chae, S. %C Unpublished doctoral dissertation, University of Chicago %G eng %0 Conference Paper %B Paper presented at the Eleventh Workshop on Item Response Theory %D 1995 %T Recursive maximum likelihood estimation, sequential designs, and computerized adaptive testing %A Ying, Z. %A Chang, Hua-Hua %B Paper presented at the Eleventh Workshop on Item Response Theory %C University of Twente, the Netherlands %G eng %0 Journal Article %J Educational Technology Systems %D 1994 %T Computer adaptive testing: A shift in the evaluation paradigm %A Carlson, R. %B Educational Technology Systems %V 22 (3) %P 213-224 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Computerized-adaptive and self-adapted music-listening tests: Features and motivational benefits %A Vispoel, W. P., %A Coffman, D. D. %B Applied Measurement in Education %V 7 %P 25-51 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Psychological Association %D 1994 %T Evaluation and implementation of CAT-ASVAB %A Curran, L. T. %A Wise, L. L. %B Paper presented at the annual meeting of the American Psychological Association %C Los Angeles %G eng %0 Journal Article %J Dissertation Abstracts International %D 1993 %T Computer adaptive testing: A comparison of four item selection strategies when used with the golden section search strategy for estimating ability %A Carlson, R. D. %K computerized adaptive testing %B Dissertation Abstracts International %V 54 %P 1772 %G eng %0 Journal Article %J Nursing Management %D 1993 %T Moving in a new direction: Computerized adaptive testing (CAT) %A Jones-Dickson, C. %A Dorsey, D. %A Campbell-Warnock, J. %A Fields, F. %K *Computers %K Accreditation/methods %K Educational Measurement/*methods %K Licensure, Nursing %K United States %B Nursing Management %7 1993/01/01 %V 24 %P 80, 82 %8 Jan %@ 0744-6314 (Print) %G eng %M 8418441 %0 Journal Article %J Bulletin of the Council for Research in Music Education %D 1992 %T Computerized adaptive testing of music-related skills %A Vispoel, W. P., %A Coffman, D. D. %B Bulletin of the Council for Research in Music Education %V 112 %P 29-49 %G eng %0 Book Section %D 1992 %T The development of alternative operational concepts %A J. R. McBride %A Curran, L. T. %C Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. %G eng %0 Book %D 1992 %T Manual for the General Scholastic Aptitude Test (Senior) Computerized adaptive test %A Von Tonder, M. %A Claasswn, N. C. W. %C Pretoria: Human Sciences Research Council %G eng %0 Journal Article %J Dissertation Abstracts International %D 1991 %T Inter-subtest branching in computerized adaptive testing %A Chang, S-H. %K computerized adaptive testing %B Dissertation Abstracts International %V 52 %P 140-141 %G eng %0 Conference Paper %B Paper presented at the ADCIS 32nd International Conference %D 1990 %T MusicCAT: An adaptive testing program to assess musical ability %A Vispoel, W. P. %A Coffman, D. %A Scriven, D. %B Paper presented at the ADCIS 32nd International Conference %C San Diego CA %G eng %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and Conventional Versions of the DAT: The First Complete Test Battery Comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and conventional versions of the DAT: The first complete test battery comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G eng %0 Book %D 1988 %T Application of appropriateness measurement to a problem in computerized adaptive testing %A Candell, G. L. %C Unpublished doctoral dissertation, University of Illinois %G eng %0 Generic %D 1988 %T Refinement of the Computerized Adaptive Screening Test (CAST) (Final Report, Contract No MDA203 06-C-0373) %A Wise, L. L. %A McHenry, J.J. %A Chia, W.J. %A Szenas, P.L. %A J. R. McBride %C Washington, DC: American Institutes for Research. %G eng %0 Conference Paper %B Paper presented at the meeting of the American Psychological Association %D 1987 %T Equating the computerized adaptive edition of the Differential Aptitude Tests %A J. R. McBride %A Corpe, V. A. %A Wing, H. %B Paper presented at the meeting of the American Psychological Association %C New York %G eng %0 Journal Article %J Multivariate Behavioral Research %D 1985 %T A structural comparison of conventional and adaptive versions of the ASVAB %A Cudeck, R. %X Examined several structural models of similarity between the Armed Services Vocational Aptitude Battery (ASVAB) and a battery of computerized adaptive tests designed to measure the same aptitudes. 12 plausible models were fitted to sample data in a double cross-validation design. 1,411 US Navy recruits completed 10 ASVAB subtests. A computerized adaptive test version of the ASVAB subtests was developed on item pools of approximately 200 items each. The items were pretested using applicants from military entrance processing stations across the US, resulting in a total calibration sample size of approximately 60,000 for the computerized adaptive tests. Three of the 12 models provided reasonable summaries of the data. One model with a multiplicative structure (M. W. Browne; see record 1984-24964-001) performed quite well. This model provides an estimate of the disattenuated method correlation between conventional testing and adaptive testing. In the present data, this correlation was estimated to be 0.97 and 0.98 in the 2 halves of the data. Results support computerized adaptive tests as replacements for conventional tests. (33 ref) (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Multivariate Behavioral Research %V 20 %P 305-322 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T Computerized diagnostic testing %A MCArthur , D.L. %A Choppin, B. H. %B Journal of Educational Measurement %V 21 %P 391-397 %0 Generic %D 1982 %T Computerized adaptive testing system design: Preliminary design considerations (Tech. Report 82-52) %A Croll, P. R. %C San Diego CA: Navy Personnel Research and Development Center. (AD A118 495) %G eng %0 Generic %D 1981 %T A comparison of two methods of interactive testing Final report. %A Nicewander, W. A. %A Chang, H. S. %A Doody, E. N. %C National Institute of Education Grant 79-1045 %G eng %0 Book %D 1981 %T Effect of error in item parameter estimates on adaptive testing (Doctoral dissertation, University of Minnesota) %A Crichton, L. I. %C Dissertation Abstracts International, 42, 06-B %G eng %0 Generic %D 1980 %T Effects of computerized adaptive testing on Black and White students (Research Report 79-2) %A Pine, S. M. %A Church, A. T. %A Gialluca, K. A. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Journal Article %J Applied Psychological Measurement %D 1980 %T Implied Orders Tailored Testing: Simulation with the Stanford-Binet %A Cudeck, R. %A McCormick, D. J. %A N. Cliff %B Applied Psychological Measurement %V 4 %P 157-163 %G English %N 2 %0 Journal Article %J Applied Psychological Measurement %D 1980 %T Implied orders tailored testing: Simulation with the Stanford-Binet %A Cudeck, R. %A McCormick, D. %A Cliff, N. A. %B Applied Psychological Measurement %V 4 %P 157-163 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Evaluation of Implied Orders as a Basis for Tailored Testing with Simulation Data %A N. Cliff %A Cudeck, R. %A McCormick, D. J. %B Applied Psychological Measurement %V 3 %P 495-514 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Evaluation of implied orders as a basis for tailored testing with simulation data %A Cliff, N. A. %A McCormick, D. %B Applied Psychological Measurement %V 3 %P 495-514 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Monte Carlo Evaluation of Implied Orders As a Basis for Tailored Testing %A Cudeck, R. %A McCormick, D. %A N. Cliff %B Applied Psychological Measurement %V 3 %P 65-74 %G English %N 1 %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Monte carlo evaluation of implied orders as a basis for tailored testing %A Cudeck, R. %A McCormick, D. J. %A Cliff, N. A. %B Applied Psychological Measurement %V 3 %P 65-74 %G eng %0 Journal Article %J Journal of Speech and Hearing Disorders %D 1978 %T Combining auditory and visual stimuli in the adaptive testing of speech discrimination %A Steele, J. A. %A Binnie, C. A. %A Cooper, W. A. %B Journal of Speech and Hearing Disorders %V 43 %P 115-122 %G eng %0 Generic %D 1978 %T Evaluations of implied orders as a basis for tailored testing using simulations (Technical Report No. 4) %A Cliff, N. A. %A Cudeck, R. %A McCormick, D. %C Los Angeles CA: University of Southern California, Department of Psychology. %G eng %0 Generic %D 1978 %T Implied orders as a basis for tailored testing (Technical Report No. 6) %A Cliff, N. A. %A Cudeck, R. %A McCormick, D. %C Los Angeles CA: University of Southern California, Department of Psychology. %G eng %0 Book Section %D 1977 %T An empirical evaluation of implied orders as a basis for tailored testing %A Cliff, N. A. %A Cudeck, R. %A McCormick, D. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1977 %T TAILOR: A FORTRAN procedure for interactive tailored testing %A Cudeck, R. A. %A Cliff, N. A. %A Kehoe, J. %B Educational and Psychological Measurement %V 37 %P 767-769 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1977 %T TAILOR-APL: An interactive computer program for individual tailored testing %A McCormick, D. %A Cliff, N. A. %B Educational and Psychological Measurement %V 37 %P 771-774 %G eng %0 Journal Article %J Psychometrika %D 1977 %T A theory of consistency ordering generalizable to tailored testing %A Cliff, N. A. %B Psychometrika %P 375-399 %G eng %0 Generic %D 1976 %T Elements of a basic test theory generalizable to tailored testing %A Cliff, N. A. %C Unpublished manuscript %G eng %0 Book %D 1976 %T Incomplete orders and computerized testing %A Cliff, N. A. %C In C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 18-23). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1976 %T Monte carlo results from a computer program for tailored testing (Technical Report No. 2) %A Cudeck, R. A. %A Cliff, N. A. %A Reynolds, T. J. %A McCormick, D. J. %C Los Angeles CA: University of California, Department of Psychology. %G eng %0 Book %D 1976 %T Proceedings of the first conference on computerized adaptive testing %A Clark, C. K. %C Washington DC: U.S. Government Printing Office %G eng %0 Book Section %D 1976 %T Using computerized tests to add new dimensions to the measurement of abilities which are important for on-job performance: An exploratory study %A Cory, C. H. %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 64-74). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1975 %T A basic test theory generalizable to tailored testing (Technical Report No 1) %A Cliff, N. A. %C Los Angeles CA: University of Southern California, Department of Psychology. %G eng %0 Journal Article %J Psychological Bulletin %D 1975 %T Complete orders from incomplete data: Interactive ordering and tailored testing %A Cliff, N. A. %B Psychological Bulletin %V 82 %P 259-302 %G eng %0 Conference Paper %B Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto %D 1975 %T Tailored testing: Maximizing validity and utility for job selection %A Croll, P. R. %A Urry, V. W. %B Paper presented at the 86th Annual Convention of the American Psychological Association. Toronto %C Canada %G eng %0 Conference Paper %B NATO Conference on Utilisation of Human Resources %D 1973 %T The potential use of tailored testing for allocation to army employments %A Killcross, M. C. %A Cassie, A %B NATO Conference on Utilisation of Human Resources %C Lisbon, Portugal %8 06/1973 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1972 %T Sequential testing for dichotomous decisions. %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %K CCAT %K CLASSIFICATION Computerized Adaptive Testing %K sequential probability ratio testing %K SPRT %B Educational and Psychological Measurement %V 32 %P 85-95. %G eng %0 Generic %D 1970 %T Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T The development and evaluation of several programmed testing methods %A Linn, R. L. %A Cleary, T. A. %B Educational and Psychological Measurement %V 29 %P 129-146 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T An exploratory study of programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Educational and Psychological Measurement %V 28 %P 345-360 %G eng %0 Generic %D 1968 %T The development and evaluation of several programmed testing methods (Research Bulletin 68-5) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of Educational Measurement %D 1968 %T Reproduction of total test score through the use of sequential programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Journal of Educational Measurement %V 5 %P 183-187 %G eng %0 Book Section %D 1966 %T New light on test strategy from decision theory %A Cronbach, L. J. %C A. Anastasi (Ed.). Testing problems in perspective. Washington DC: American Council on Education. %G eng %0 Journal Article %J Journal of the American Statistical Association %D 1946 %T An application of sequential sampling to testing students %A Cowden, D. J. %B Journal of the American Statistical Association %V 41 %P 547-556 %G eng