TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Ming Him Tai A1 - Allison W. Cooperman A1 - Joseph N. DeWeese A1 - David J. Weiss KW - adaptive measurement of change KW - computerized adaptive testing KW - longitudinal measurement KW - trait change patterns VL - 10 IS - 3 ER - TY - JOUR T1 - The (non)Impact of Misfitting Items in Computerized Adaptive Testing JF - Journal of Computerized Adaptive Testing Y1 - 2022 A1 - Christine E. DeMars KW - computerized adaptive testing KW - item fit KW - three-parameter logistic model VL - 9 UR - https://jcatpub.net/index.php/jcat/issue/view/26 IS - 2 ER - TY - JOUR T1 - A Blocked-CAT Procedure for CD-CAT JF - Applied Psychological Measurement Y1 - 2020 A1 - Mehmet Kaplan A1 - Jimmy de la Torre AB - This article introduces a blocked-design procedure for cognitive diagnosis computerized adaptive testing (CD-CAT), which allows examinees to review items and change their answers during test administration. Four blocking versions of the new procedure were proposed. In addition, the impact of several factors, namely, item quality, generating model, block size, and test length, on the classification rates was investigated. Three popular item selection indices in CD-CAT were used and their efficiency compared using the new procedure. An additional study was carried out to examine the potential benefit of item review. The results showed that the new procedure is promising in that allowing item review resulted only in a small loss in attribute classification accuracy under some conditions. Moreover, using a blocked-design CD-CAT is beneficial to the extent that it alleviates the negative impact of test anxiety on examinees’ true performance. VL - 44 UR - https://doi.org/10.1177/0146621619835500 ER - TY - JOUR T1 - Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare JF - BMC Pediatrics Y1 - 2020 A1 - Theunissen, Meninou H.C. A1 - de Wolff, Marianne S. A1 - Deurloo, Jacqueline A. A1 - Vogels, Anton G. C. AB -

Background

Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.

Methods

We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.

Results

Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).

Conclusions

CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.

VL - 20 UR - https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 IS - Article number: 119 ER - TY - JOUR T1 - Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback–Leibler Information JF - Applied Psychological Measurement Y1 - 2020 A1 - Dries Debeer A1 - Peter W. van Rijn A1 - Usama S. Ali AB - Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case, Fisher information (FI) is commonly used as the statistical target to obtain parallelism. In the multidimensional case, however, FI is a multidimensional matrix, which complicates its use as a statistical target. Previous research addressing this problem focused on item selection criteria for multidimensional computerized adaptive testing (MCAT). Yet these selection criteria are not directly transferable to the assembly of linear parallel test forms. To bridge this gap the authors derive different statistical targets, based on either FI or the Kullback–Leibler (KL) divergence, that can be applied in MILP models to assemble multidimensional parallel test forms. Using simulated item pools and an item pool based on empirical items, the proposed statistical targets are compared and evaluated. Promising results with respect to the KL-based statistical targets are presented and discussed. VL - 44 UR - https://doi.org/10.1177/0146621619827586 ER - TY - JOUR T1 - Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data JF - Applied Psychological Measurement Y1 - 2019 A1 - Hulya D. Yigit A1 - Miguel A. Sorrel A1 - Jimmy de la Torre AB - Cognitive diagnosis models (CDMs) are latent class models that hold great promise for providing diagnostic information about student knowledge profiles. The increasing use of computers in classrooms enhances the advantages of CDMs for more efficient diagnostic testing by using adaptive algorithms, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT). When multiple-choice items are involved, CD-CAT can be further improved by using polytomous scoring (i.e., considering the specific options students choose), instead of dichotomous scoring (i.e., marking answers as either right or wrong). In this study, the authors propose and evaluate the performance of the Jensen–Shannon divergence (JSD) index as an item selection method for the multiple-choice deterministic inputs, noisy “and” gate (MC-DINA) model. Attribute classification accuracy and item usage are evaluated under different conditions of item quality and test termination rule. The proposed approach is compared with the random selection method and an approximate approach based on dichotomized responses. The results show that under the MC-DINA model, JSD improves the attribute classification accuracy significantly by considering the information from distractors, even with a very short test length. This result has important implications in practical classroom settings as it can allow for dramatically reduced testing times, thus resulting in more targeted learning opportunities. VL - 43 UR - https://doi.org/10.1177/0146621618798665 ER - TY - JOUR T1 - Developing Multistage Tests Using D-Scoring Method JF - Educational and Psychological Measurement Y1 - 2019 A1 - Kyung (Chris) T. Han A1 - Dimiter M. Dimitrov A1 - Faisal Al-Mashary AB - The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method. VL - 79 UR - https://doi.org/10.1177/0013164419841428 ER - TY - JOUR T1 - Constructing Shadow Tests in Variable-Length Adaptive Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Qi Diao A1 - Hao Ren AB - Imposing content constraints is very important in most operational computerized adaptive testing (CAT) programs in educational measurement. Shadow test approach to CAT (Shadow CAT) offers an elegant solution to imposing statistical and nonstatistical constraints by projecting future consequences of item selection. The original form of Shadow CAT presumes fixed test lengths. The goal of the current study was to extend Shadow CAT to tests under variable-length termination conditions and evaluate its performance relative to other content balancing approaches. The study demonstrated the feasibility of constructing Shadow CAT with variable test lengths and in operational CAT programs. The results indicated the superiority of the approach compared with other content balancing methods. VL - 42 UR - https://doi.org/10.1177/0146621617753736 ER - TY - JOUR T1 - A Continuous a-Stratification Index for Item Exposure Control in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2018 A1 - Alan Huebner A1 - Chun Wang A1 - Bridget Daly A1 - Colleen Pinkelman AB - The method of a-stratification aims to reduce item overexposure in computerized adaptive testing, as items that are administered at very high rates may threaten the validity of test scores. In existing methods of a-stratification, the item bank is partitioned into a fixed number of nonoverlapping strata according to the items’a, or discrimination, parameters. This article introduces a continuous a-stratification index which incorporates exposure control into the item selection index itself and thus eliminates the need for fixed discrete strata. The new continuous a-stratification index is compared with existing stratification methods via simulation studies in terms of ability estimation bias, mean squared error, and control of item exposure rates. VL - 42 UR - https://doi.org/10.1177/0146621618758289 ER - TY - JOUR T1 - ATS-PD: An Adaptive Testing System for Psychological Disorders JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ivan Donadello A1 - Andrea Spoto A1 - Francesco Sambo A1 - Silvana Badaloni A1 - Umberto Granziol A1 - Giulio Vidotto AB - The clinical assessment of mental disorders can be a time-consuming and error-prone procedure, consisting of a sequence of diagnostic hypothesis formulation and testing aimed at restricting the set of plausible diagnoses for the patient. In this article, we propose a novel computerized system for the adaptive testing of psychological disorders. The proposed system combines a mathematical representation of psychological disorders, known as the “formal psychological assessment,” with an algorithm designed for the adaptive assessment of an individual’s knowledge. The assessment algorithm is extended and adapted to the new application domain. Testing the system on a real sample of 4,324 healthy individuals, screened for obsessive-compulsive disorder, we demonstrate the system’s ability to support clinical testing, both by identifying the correct critical areas for each individual and by reducing the number of posed questions with respect to a standard written questionnaire. VL - 77 UR - https://doi.org/10.1177/0013164416652188 ER - TY - CONF T1 - Bayesian Perspectives on Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Wim J. van der Linden A1 - Bingnan Jiang A1 - Hao Ren A1 - Seung W. Choi A1 - Qi Diao KW - Bayesian Perspective KW - CAT AB -

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - JOUR T1 - The Development of MST Test Information for the Prediction of Test Performances JF - Educational and Psychological Measurement Y1 - 2017 A1 - Ryoungsun Park A1 - Jiseon Kim A1 - Hyewon Chung A1 - Barbara G. Dodd AB - The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance. VL - 77 UR - http://dx.doi.org/10.1177/0013164416662960 ER - TY - JOUR T1 - Projection-Based Stopping Rules for Computerized Adaptive Testing in Licensure Testing JF - Applied Psychological MeasurementApplied Psychological Measurement Y1 - 2017 A1 - Luo, Xiao A1 - Kim, Doyoung A1 - Dickison, Philip AB - The confidence interval (CI) stopping rule is commonly used in licensure settings to make classification decisions with fewer items in computerized adaptive testing (CAT). However, it tends to be less efficient in the near-cut regions of the ? scale, as the CI often fails to be narrow enough for an early termination decision prior to reaching the maximum test length. To solve this problem, this study proposed the projection-based stopping rules that base the termination decisions on the algorithmically projected range of the final ? estimate at the hypothetical completion of the CAT. A simulation study and an empirical study were conducted to show the advantages of the projection-based rules over the CI rule, in which the projection-based rules reduced the test length without jeopardizing critical psychometric qualities of the test, such as the ? and classification precision. Operationally, these rules do not require additional regularization parameters, because the projection is simply a hypothetical extension of the current test within the existing CAT environment. Because these new rules are specifically designed to address the decreased efficiency in the near-cut regions as opposed to for the entire scale, the authors recommend using them in conjunction with the CI rule in practice. VL - 42 SN - 0146-6216 UR - https://doi.org/10.1177/0146621617726790 IS - 4 JO - Applied Psychological Measurement ER - TY - CONF T1 - Using Automated Item Generation in a Large-scale Medical Licensure Exam Program: Lessons Learned. T2 - 2017 IACAT Conference Y1 - 2017 A1 - André F. De Champlain KW - Automated item generation KW - large scale KW - medical licensure AB -

On-demand testing has become commonplace with most large-scale testing programs. Continuous testing is appealing for candidates in that it affords greater flexibility in scheduling a session at the desired location. Furthermore, the push for more comprehensive systems of assessment (e.g. CBAL) is predicated on the availability of more frequently administered tasks given the purposeful link between instruction and assessment in these frameworks. However, continuous testing models impose several challenges to programs, including overexposure of items. Robust item banks are therefore needed to support routine retirement and replenishment of items. In a traditional approach to developing items, content experts select a topic and then develop an item consisting of a stem, lead-in question, a correct answer and list of distractors. The item then undergoes review by a panel of experts to validate the content and identify any potential flaws. The process involved in developing quality MCQ items can be time-consuming as well as costly, with estimates as high as $1500-$2500 USD per item (Rudner, 2010). The Medical Council of Canada (MCC) has been exploring a novel item development process to supplement traditional approaches. Specifically, the use of automated item generation (AIG), which uses technology to generate test items from cognitive models, has been studied for over five years. Cognitive models are representations of the knowledge and skills that are required to solve any given problem. While developing a cognitive model for a medical scenario, for example, content experts are asked to deconstruct the (clinical) reasoning process involved via clearly stated variables and related elements. The latter information is then entered into a computer program that uses algorithms to generate MCQs. The MCC has been piloting AIG –based items for over five years with the MCC Qualifying Examination Part I (MCCQE I), a pre-requisite for licensure in Canada. The aim of this presentation is to provide an overview of the practical lessons learned in the use and operational rollout of AIG with the MCCQE I. Psychometrically, the quality of the items is at least equal, and in many instances superior, to that of traditionally written MCQs, based on difficulty, discrimination, and information. In fact, 96% of the AIG based items piloted in a recent administration were retained for future operational scoring based on pre-defined inclusion criteria. AIG also offers a framework for the systematic creation of plausible distractors, in that the content experts not only need to provide the clinical reasoning underlying a correct response but also the cognitive errors associated with each of the distractors (Lai et al. 2016). Consequently, AIG holds great promise in regard to improving and tailoring diagnostic feedback for remedial purposes (Pugh, De Champlain, Gierl, Lai, Touchie, 2016). Furthermore, our test development process has been greatly enhanced by the addition of AIG as it requires that item writers use metacognitive skills to describe how they solve problems. We are hopeful that sharing our experiences with attendees might not only help other testing organizations interested in adopting AIG, but also foster discussion which might benefit all participants.

References

Lai, H., Gierl, M.J., Touchie, C., Pugh, D., Boulais, A.P., & De Champlain, A.F. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching and Learning in Medicine, 28, 166-173.

Pugh, D., De Champlain, A.F., Lai, H., Gierl, M., & Touchie, C. (2016). Using cognitive models to develop quality multiple choice questions. Medical Teacher, 38, 838-843.

Rudner, L. (2010). Implementing the Graduate Management Admission Test Computerized Adaptive Test. In W. van der Linden & C. Glass (Eds.), Elements of adaptive testing (pp. 151-165). New York, NY: Springer. 

Presentation Video

JF - 2017 IACAT Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=14N8hUc8qexAy5W_94TykEDABGVIJHG1h ER - TY - JOUR T1 - The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study JF - Quality of Life Research Y1 - 2017 A1 - Barthel, D. A1 - Otto, C. A1 - Nolte, S. A1 - Meyrose, A.-K. A1 - Fischer, F. A1 - Devine, J. A1 - Walter, O. A1 - Mierke, A. A1 - Fischer, K. I. A1 - Thyen, U. A1 - Klein, M. A1 - Ankermann, T. A1 - Rose, M. A1 - Ravens-Sieberer, U. AB - Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. VL - 26 UR - https://doi.org/10.1007/s11136-016-1437-9 ER - TY - JOUR T1 - Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing: Based on the RRUM JF - Applied Psychological Measurement Y1 - 2016 A1 - Dai, Buyun A1 - Zhang, Minqiang A1 - Li, Guangming AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) can be divided into two broad categories: (a) single-purpose tests, which are based on the subject’s knowledge state (KS) alone, and (b) dual-purpose tests, which are based on both the subject’s KS and traditional ability level ( ). This article seeks to identify the most efficient item selection method for the latter type of CD-CAT corresponding to various conditions and various evaluation criteria, respectively, based on the reduced reparameterized unified model (RRUM) and the two-parameter logistic model of item response theory (IRT-2PLM). The Shannon entropy (SHE) and Fisher information methods were combined to produce a new synthetic item selection index, that is, the “dapperness with information (DWI)” index, which concurrently considers both KS and within one step. The new method was compared with four other methods. The results showed that, in most conditions, the new method exhibited the best performance in terms of KS estimation and the second-best performance in terms of estimation. Item utilization uniformity and computing time are also considered for all the competing methods. VL - 40 UR - http://apm.sagepub.com/content/40/8/625.abstract ER - TY - JOUR T1 - Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design JF - Journal of Educational Measurement Y1 - 2016 A1 - Wang, Shiyu A1 - Lin, Haiyan A1 - Chang, Hua-Hua A1 - Douglas, Jeff AB - Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. VL - 53 UR - http://dx.doi.org/10.1111/jedm.12100 ER - TY - JOUR T1 - New Item Selection Methods for Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2015 A1 - Kaplan, Mehmet A1 - de la Torre, Jimmy A1 - Barrada, Juan Ramón AB - This article introduces two new item selection methods, the modified posterior-weighted Kullback–Leibler index (MPWKL) and the generalized deterministic inputs, noisy “and” gate (G-DINA) model discrimination index (GDI), that can be used in cognitive diagnosis computerized adaptive testing. The efficiency of the new methods is compared with the posterior-weighted Kullback–Leibler (PWKL) item selection index using a simulation study in the context of the G-DINA model. The impact of item quality, generating models, and test termination rules on attribute classification accuracy or test length is also investigated. The results of the study show that the MPWKL and GDI perform very similarly, and have higher correct attribute classification rates or shorter mean test lengths compared with the PWKL. In addition, the GDI has the shortest implementation time among the three indices. The proportion of item usage with respect to the required attributes across the different conditions is also tracked and discussed. VL - 39 UR - http://apm.sagepub.com/content/39/3/167.abstract ER - TY - JOUR T1 - A Comparison of Four Item-Selection Methods for Severely Constrained CATs JF - Educational and Psychological Measurement Y1 - 2014 A1 - He, Wei A1 - Diao, Qi A1 - Hauser, Carl AB -

This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several constraints at the same time). The procedures examined in the study included the weighted deviation model (WDM), the weighted penalty model (WPM), the maximum priority index (MPI), and the shadow test approach (STA). In addition, two modified versions of the MPI procedure were introduced to deal with an edge case condition that results in the item selection procedure becoming dysfunctional during a test. The results suggest that the STA worked best among all candidate methods in terms of measurement accuracy and constraint management. For the other three heuristic approaches, they did not differ significantly in measurement accuracy and constraint management at the lower bound level. However, the WPM method appears to perform considerably better in overall constraint management than either the WDM or MPI method. Limitations and future research directions were also discussed.

VL - 74 UR - http://epm.sagepub.com/content/74/4/677.abstract ER - TY - JOUR T1 - Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests JF - Applied Psychological Measurement Y1 - 2014 A1 - Park, Ryoungsun A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. AB -

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

VL - 38 UR - http://apm.sagepub.com/content/38/4/268.abstract ER - TY - JOUR T1 - A Comparison of Exposure Control Procedures in CAT Systems Based on Different Measurement Models for Testlets JF - Applied Measurement in Education Y1 - 2013 A1 - Boyd, Aimee M. A1 - Dodd, Barbara A1 - Fitzpatrick, Steven VL - 26 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2013.765434 ER - TY - JOUR T1 - A Comparison of Exposure Control Procedures in CATs Using the 3PL Model JF - Educational and Psychological Measurement Y1 - 2013 A1 - Leroux, Audrey J. A1 - Lopez, Myriam A1 - Hembry, Ian A1 - Dodd, Barbara G. AB -

This study compares the progressive-restricted standard error (PR-SE) exposure control procedure to three commonly used procedures in computerized adaptive testing, the randomesque, Sympson–Hetter (SH), and no exposure control methods. The performance of these four procedures is evaluated using the three-parameter logistic model under the manipulated conditions of item pool size (small vs. large) and stopping rules (fixed-length vs. variable-length). PR-SE provides the advantage of similar constraints to SH, without the need for a preceding simulation study to execute it. Overall for the large and small item banks, the PR-SE method administered almost all of the items from the item pool, whereas the other procedures administered about 52% or less of the large item bank and 80% or less of the small item bank. The PR-SE yielded the smallest amount of item overlap between tests across conditions and administered fewer items on average than SH. PR-SE obtained these results with similar, and acceptable, measurement precision compared to the other exposure control procedures while vastly improving on item pool usage.

VL - 73 UR - http://epm.sagepub.com/content/73/5/857.abstract ER - TY - JOUR T1 - The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Patton, Jeffrey M. A1 - Ying Cheng, A1 - Yuan, Ke-Hai A1 - Diao, Qi AB -

Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be “tailored” to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends on the item parameter values. However, items are chosen on the basis of their parameter estimates, and capitalization on chance may occur. In this article, the authors investigated the effects of capitalization on chance on test length and classification accuracy in several VL-CAT simulations. The results confirm that capitalization on chance occurs in VL-CAT and has complex effects on test length, ability estimation, and classification accuracy. These results have important implications for the design and implementation of VL-CATs.

VL - 37 UR - http://apm.sagepub.com/content/37/1/24.abstract ER - TY - JOUR T1 - Integrating Test-Form Formatting Into Automated Test Assembly JF - Applied Psychological Measurement Y1 - 2013 A1 - Diao, Qi A1 - van der Linden, Wim J. AB -

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using a simultaneous optimization model is more attractive than any of the current, more time-consuming two-stage processes. The goal of this study was to provide such simultaneous models both for computer-delivered and paper forms, as well as explore their performances relative to two-stage optimization. Empirical examples are presented to show that it is possible to automatically produce fully formatted optimal test forms directly from item pools up to some 2,000 items on a regular PC in realistic times.

VL - 37 UR - http://apm.sagepub.com/content/37/5/361.abstract ER - TY - JOUR T1 - A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2013 A1 - Wang, Chun A1 - Fan, Zhewen A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A. AB -

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

VL - 38 UR - http://jeb.sagepub.com/cgi/content/abstract/38/4/381 ER - TY - JOUR T1 - Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Moyer, Eric L. A1 - Galindo, Jennifer L. A1 - Dodd, Barbara G. AB -

Managing test specifications—both multiple nonstatistical constraints and flexibly defined constraints—has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT, and the weighted penalty model in balancing multiple flexible constraints and maximizing measurement precision in a fixed-length CAT. The study also addressed the effect of two different test lengths—25 items and 50 items—and of including or excluding the randomesque item exposure control procedure with the three methods, all of which were found effective in selecting items that met flexible test constraints when used in the item selection process for longer tests. When the randomesque method was included to control for item exposure, the weighted penalty model and the flexible modified constrained CAT models performed better than did the constrained CAT procedure in maintaining measurement precision. When no item exposure control method was used in the item selection process, no practical difference was found in the measurement precision of each balancing method.

VL - 72 UR - http://epm.sagepub.com/content/72/4/629.abstract ER - TY - JOUR T1 - Comparison of Exposure Controls, Item Pool Characteristics, and Population Distributions for CAT Using the Partial Credit Model JF - Educational and Psychological Measurement Y1 - 2012 A1 - Lee, HwaYoung A1 - Dodd, Barbara G. AB -

This study investigated item exposure control procedures under various combinations of item pool characteristics and ability distributions in computerized adaptive testing based on the partial credit model. Three variables were manipulated: item pool characteristics (120 items for each of easy, medium, and hard item pools), two ability distributions (normally distributed and negatively skewed data), and three exposure control procedures (randomesque procedure, progressive–restricted procedure, and maximum information procedure). A number of measurement precision indexes such as descriptive statistics, correlations between known and estimated ability levels, bias, root mean squared error, and average absolute difference, exposure rates, item usage, and item overlap were computed to assess the impact of matched or nonmatched item pool and ability distributions on the accuracy of ability estimation and the performance of exposure control procedures. As expected, the medium item pool produced better precision of measurement than both the easy and hard item pools. The progressive–restricted procedure performed better in terms of maximum exposure rates, item average overlap, and pool utilization than both the randomesque procedure and the maximum information procedure. The easy item pool with the negatively skewed data as a mismatched condition produced the worst performance.

VL - 72 UR - http://epm.sagepub.com/content/72/1/159.abstract ER - TY - JOUR T1 - Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test JF - Applied Measurement in Education Y1 - 2012 A1 - Ho, Tsung-Han A1 - Dodd, Barbara G. VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.714686 ER - TY - JOUR T1 - Panel Design Variations in the Multistage Test Using the Mixed-Format Tests JF - Educational and Psychological Measurement Y1 - 2012 A1 - Kim, Jiseon A1 - Chung, Hyewon A1 - Dodd, Barbara G. A1 - Park, Ryoungsun AB -

This study compared various panel designs of the multistage test (MST) using mixed-format tests in the context of classification testing. Simulations varied the design of the first-stage module. The first stage was constructed according to three levels of test information functions (TIFs) with three different TIF centers. Additional computerized adaptive test (CAT) conditions provided baseline comparisons. Three passing rate conditions were also included. The various MST conditions using mixed-format tests were constructed properly and performed well. When the levels of TIFs at the first stage were higher, the simulations produced a greater number of correct classifications. CAT with the randomesque-10 procedure yielded comparable results to the MST with increased levels of TIFs. Finally, all MST conditions achieved better test security results compared with CAT’s maximum information conditions.

VL - 72 UR - http://epm.sagepub.com/content/72/4/574.abstract ER - TY - JOUR T1 - The Problem of Bias in Person Parameter Estimation in Adaptive Testing JF - Applied Psychological Measurement Y1 - 2012 A1 - Doebler, Anna AB -

It is shown that deviations of estimated from true values of item difficulty parameters, caused for example by item calibration errors, the neglect of randomness of item difficulty parameters, testlet effects, or rule-based item generation, can lead to systematic bias in point estimation of person parameters in the context of adaptive testing. This effect occurs even when the errors of the item difficulty parameters are themselves unbiased. Analytical calculations as well as simulation studies are discussed.

VL - 36 UR - http://apm.sagepub.com/content/36/4/255.abstract ER - TY - JOUR T1 - On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing JF - Educational and Psychological Measurement Y1 - 2012 A1 - Davison, Mark L. A1 - Semmes, Robert A1 - Huang, Lan A1 - Close, Catherine N. AB -

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

VL - 72 UR - http://epm.sagepub.com/content/72/2/245.abstract ER - TY - CONF T1 - Impact of Item Drift on Candidate Ability Estimation T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Sarah Hagge A1 - Ada Woo A1 - Phil Dickison KW - item drift AB -

For large operational pools, candidate ability estimates appear robust to item drift, especially under conditions that may represent ‘normal’ amounts of drift. Even with ‘extreme’ conditions of drift (e.g., 20% of items drifting 1.00 logits), decision consistency was still high.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - A New Stopping Rule for Computerized Adaptive Testing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Choi, Seung W. A1 - Grady, Matthew W. A1 - Dodd, Barbara G. AB -

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

VL - 71 UR - http://epm.sagepub.com/content/71/1/37.abstract ER - TY - CONF T1 - The Use of Decision Trees for Adaptive Item Selection and Score Estimation T2 - Annual Conference of the International Association for Computerized Adaptive Testing Y1 - 2011 A1 - Barth B. Riley A1 - Rodney Funk A1 - Michael L. Dennis A1 - Richard D. Lennox A1 - Matthew Finkelman KW - adaptive item selection KW - CAT KW - decision tree AB -

Conducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.

Conclusions

Decision tree methods were more efficient than CAT

But,...

Conclusions

CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination

Decision Trees select items that best discriminate between groups defined by the total score.

CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.

JF - Annual Conference of the International Association for Computerized Adaptive Testing ER - TY - JOUR T1 - A comparison of content-balancing procedures for estimating multiple clinical domains in computerized adaptive testing: Relative precision, validity, and detection of persons with misfitting responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Riley, B. B. A1 - Dennis, M. L. A1 - Conrad, K. J. AB - This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target numberof items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.Keywords VL - 34 SN - 0146-62161552-3497 ER - TY - JOUR T1 - A Comparison of Content-Balancing Procedures for Estimating Multiple Clinical Domains in Computerized Adaptive Testing: Relative Precision, Validity, and Detection of Persons With Misfitting Responses JF - Applied Psychological Measurement Y1 - 2010 A1 - Barth B. Riley A1 - Michael L. Dennis A1 - Conrad, Kendon J. AB -

This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target number of items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.

VL - 34 UR - http://apm.sagepub.com/content/34/6/410.abstract ER - TY - JOUR T1 - A Comparison of Item Selection Techniques for Testlets JF - Applied Psychological Measurement Y1 - 2010 A1 - Murphy, Daniel L. A1 - Dodd, Barbara G. A1 - Vaughn, Brandon K. AB -

This study examined the performance of the maximum Fisher’s information, the maximum posterior weighted information, and the minimum expected posterior variance methods for selecting items in a computerized adaptive testing system when the items were grouped in testlets. A simulation study compared the efficiency of ability estimation among the item selection techniques under varying conditions of local-item dependency when the response model was either the three-parameter-logistic item response theory or the three-parameter-logistic testlet response theory. The item selection techniques performed similarly within any particular condition, the practical implications of which are discussed within the article.

VL - 34 UR - http://apm.sagepub.com/content/34/6/424.abstract ER - TY - JOUR T1 - Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments JF - Sleep Y1 - 2010 A1 - Buysse, D. J. A1 - Yu, L. A1 - Moul, D. E. A1 - Germain, A. A1 - Stover, A. A1 - Dodds, N. E. A1 - Johnston, K. L. A1 - Shablesky-Cade, M. A. A1 - Pilkonis, P. A. KW - *Outcome Assessment (Health Care) KW - *Self Disclosure KW - Adult KW - Aged KW - Aged, 80 and over KW - Cross-Sectional Studies KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Psychometrics KW - Questionnaires KW - Reproducibility of Results KW - Sleep Disorders/*diagnosis KW - Young Adult AB - STUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions. VL - 33 SN - 0161-8105 (Print)0161-8105 (Linking) N1 - Buysse, Daniel JYu, LanMoul, Douglas EGermain, AnneStover, AngelaDodds, Nathan EJohnston, Kelly LShablesky-Cade, Melissa APilkonis, Paul AAR052155/AR/NIAMS NIH HHS/United StatesU01AR52155/AR/NIAMS NIH HHS/United StatesU01AR52158/AR/NIAMS NIH HHS/United StatesU01AR52170/AR/NIAMS NIH HHS/United StatesU01AR52171/AR/NIAMS NIH HHS/United StatesU01AR52177/AR/NIAMS NIH HHS/United StatesU01AR52181/AR/NIAMS NIH HHS/United StatesU01AR52186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralValidation StudiesUnited StatesSleepSleep. 2010 Jun 1;33(6):781-92. U2 - 2880437 ER - TY - CHAP T1 - Innovative Items for Computerized Testing T2 - Elements of Adaptive Testing Y1 - 2010 A1 - Parshall, C. G. A1 - Harmes, J. C. A1 - Davey, T. A1 - Pashley, P. J. JF - Elements of Adaptive Testing ER - TY - JOUR T1 - A new stopping rule for computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 2010 A1 - Choi, S. W. A1 - Grady, M. W. A1 - Dodd, B. G. AB - The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. VL - 70 SN - 0013-1644 (Print)0013-1644 (Linking) N1 - U01 AR052177-04/NIAMS NIH HHS/Educ Psychol Meas. 2010 Dec 1;70(6):1-17. U2 - 3028267 ER - TY - JOUR T1 - Stratified and maximum information item selection procedures in computer adaptive testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, H. A1 - Ansley, T. A1 - Chang, H.-H. VL - 47 ER - TY - JOUR T1 - Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing JF - Journal of Educational Measurement Y1 - 2010 A1 - Deng, Hui A1 - Ansley, Timothy A1 - Chang, Hua-Hua AB -

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

VL - 47 UR - http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x ER - TY - ABST T1 - Validation of a computer-adaptive test to evaluate generic health-related quality of life Y1 - 2010 A1 - Rebollo, P. A1 - Castejon, I. A1 - Cuervo, J. A1 - Villa, G. A1 - Garcia-Cueto, E. A1 - Diaz-Cuervo, H. A1 - Zardain, P. C. A1 - Muniz, J. A1 - Alonso, J. AB - BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. JF - Health and Quality of Life Outcomes VL - 8 SN - 1477-7525 (Electronic)1477-7525 (Linking) N1 - Rebollo, PabloCastejon, IgnacioCuervo, JesusVilla, GuillermoGarcia-Cueto, EduardoDiaz-Cuervo, HelenaZardain, Pilar CMuniz, JoseAlonso, JordiSpanish CAT-Health Research GroupEnglandHealth Qual Life Outcomes. 2010 Dec 3;8:147. U2 - 3022567 ER - TY - CHAP T1 - Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing Y1 - 2009 A1 - Diao, Q. A1 - Reckase, M. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 342 KB} ER - TY - JOUR T1 - Constraint-Weighted a-Stratification for Computerized Adaptive Testing With Nonstatistical Constraints JF - Educational and Psychological Measurement Y1 - 2009 A1 - Ying Cheng, A1 - Chang, Hua-Hua A1 - Douglas, Jeffrey A1 - Fanmin Guo, AB -

a-stratification is a method that utilizes items with small discrimination (a) parameters early in an exam and those with higher a values when more is learned about the ability parameter. It can achieve much better item usage than the maximum information criterion (MIC). To make a-stratification more practical and more widely applicable, a method for weighting the item selection process in a-stratification as a means of satisfying multiple test constraints is proposed. This method is studied in simulation against an analogous method without stratification as well as a-stratification using descending-rather than ascending-a procedures. In addition, a variation of a-stratification that allows for unbalanced usage of a parameters is included in the study to examine the trade-off between efficiency and exposure control. Finally, MIC and randomized item selection are included as baseline measures. Results indicate that the weighting mechanism successfully addresses the constraints, that stratification helps to a great extent balancing exposure rates, and that the ascending-a design improves measurement precision.

VL - 69 UR - http://epm.sagepub.com/content/69/1/35.abstract ER - TY - JOUR T1 - Constraint-weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control JF - Educational and Psychological Measurement Y1 - 2009 A1 - Cheng, Y A1 - Chang, Hua-Hua A1 - Douglas, J. A1 - Guo, F. VL - 69 ER - TY - JOUR T1 - A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing JF - Journal of Educational and Behavioral Statistics Y1 - 2009 A1 - Doong, S. H. AB -

The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter–type simulation. Based on these simulated parameters, a functional relation k = fP (a, b, c) relating IPs to IEPs of P is obtained by an artificial neural network and used to estimate IEPs of Q without tedious simulation. Extensive experiments using real and synthetic pools showed that this approach worked pretty well for many variants of the Sympson and Hetter procedure. It worked excellently for the conditional Stocking and Lewis multinomial selection procedure and the Chen and Lei item exposure and test overlap control procedure. This study provides the first step in an alternative means to estimate IEPs without iterative simulation.

VL - 34 UR - http://jeb.sagepub.com/cgi/content/abstract/34/4/530 ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - CHAP T1 - Obtaining reliable diagnostic information through constrained CAT Y1 - 2009 A1 - Wang, C. A1 - Chang, Hua-Hua A1 - Douglas, J. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 252 KB} ER - TY - JOUR T1 - Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Coster, W. J. A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Disability Evaluation KW - *Social Adjustment KW - Activities of Daily Living KW - Adolescent KW - Age Factors KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Over Studies KW - Disabled Children/*rehabilitation KW - Female KW - Follow-Up Studies KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care) KW - Reference Values KW - Reproducibility of Results KW - Retrospective Studies KW - Risk Factors KW - Self Care/*standards/trends KW - Sex Factors KW - Sickness Impact Profile AB - OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Coster, Wendy JHaley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesR41 HD052318-01A1/HD/NICHD NIH HHS/United StatesR43 HD42388-01/HD/NICHD NIH HHS/United StatesComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Apr;89(4):622-9. U2 - 2666276 ER - TY - JOUR T1 - Computerized adaptive testing in back pain: Validation of the CAT-5D-QOL JF - Spine Y1 - 2008 A1 - Kopec, J. A. A1 - Badii, M. A1 - McKenna, M. A1 - Lima, V. D. A1 - Sayre, E. C. A1 - Dvorak, M. KW - *Disability Evaluation KW - *Health Status Indicators KW - *Quality of Life KW - Adult KW - Aged KW - Algorithms KW - Back Pain/*diagnosis/psychology KW - British Columbia KW - Diagnosis, Computer-Assisted/*standards KW - Feasibility Studies KW - Female KW - Humans KW - Internet KW - Male KW - Middle Aged KW - Predictive Value of Tests KW - Questionnaires/*standards KW - Reproducibility of Results AB - STUDY DESIGN: We have conducted an outcome instrument validation study. OBJECTIVE: Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. SUMMARY OF BACKGROUND DATA: Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). METHODS: An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. RESULTS: A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. CONCLUSION: The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden. VL - 33 SN - 1528-1159 (Electronic)0362-2436 (Linking) N1 - Kopec, Jacek ABadii, MaziarMcKenna, MarioLima, Viviane DSayre, Eric CDvorak, MarcelResearch Support, Non-U.S. Gov'tValidation StudiesUnited StatesSpineSpine (Phila Pa 1976). 2008 May 20;33(12):1384-90. ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - Letting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire JF - Spine Y1 - 2008 A1 - Cook, K. F. A1 - Choi, S. W. A1 - Crane, P. K. A1 - Deyo, R. A. A1 - Johnson, K. L. A1 - Amtmann, D. KW - *Disability Evaluation KW - *Health Status Indicators KW - Adult KW - Aged KW - Aged, 80 and over KW - Back Pain/*diagnosis/psychology KW - Calibration KW - Computer Simulation KW - Diagnosis, Computer-Assisted/*standards KW - Humans KW - Middle Aged KW - Models, Psychological KW - Predictive Value of Tests KW - Questionnaires/*standards KW - Reproducibility of Results AB - STUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages. VL - 33 SN - 1528-1159 (Electronic) N1 - Cook, Karon FChoi, Seung WCrane, Paul KDeyo, Richard AJohnson, Kurt LAmtmann, Dagmar5 P60-AR48093/AR/United States NIAMS5U01AR052171-03/AR/United States NIAMSComparative StudyResearch Support, N.I.H., ExtramuralUnited StatesSpineSpine. 2008 May 20;33(12):1378-83. ER - TY - JOUR T1 - Measuring physical functioning in children with spinal impairments with computerized adaptive testing JF - Journal of Pediatric Orthopedics Y1 - 2008 A1 - Mulcahey, M. J. A1 - Haley, S. M. A1 - Duffy, T. A1 - Pengsheng, N. A1 - Betz, R. R. KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Computer Simulation KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Kyphosis/*diagnosis/rehabilitation KW - Male KW - Prospective Studies KW - Reproducibility of Results KW - Scoliosis/*diagnosis/rehabilitation AB - BACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice. VL - 28 SN - 0271-6798 (Print)0271-6798 (Linking) N1 - Mulcahey, M JHaley, Stephen MDuffy, TheresaPengsheng, NiBetz, Randal RK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesUnited StatesJournal of pediatric orthopedicsJ Pediatr Orthop. 2008 Apr-May;28(3):330-5. U2 - 2696932 ER - TY - JOUR T1 - Predicting item exposure parameters in computerized adaptive testing JF - British Journal of Mathematical and Statistical Psychology Y1 - 2008 A1 - Chen, S-Y. A1 - Doong, S. H. KW - *Algorithms KW - *Artificial Intelligence KW - Aptitude Tests/*statistics & numerical data KW - Diagnosis, Computer-Assisted/*statistics & numerical data KW - Humans KW - Models, Statistical KW - Psychometrics/statistics & numerical data KW - Reproducibility of Results KW - Software AB - The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters. VL - 61 SN - 0007-1102 (Print)0007-1102 (Linking) N1 - Chen, Shu-YingDoong, Shing-HwangResearch Support, Non-U.S. Gov'tEnglandThe British journal of mathematical and statistical psychologyBr J Math Stat Psychol. 2008 May;61(Pt 1):75-91. ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with the partial credit model JF - Journal of Applied Measurement Y1 - 2008 A1 - Davis, L. L. A1 - Dodd, B. G. KW - *Algorithms KW - *Computers KW - *Educational Measurement/statistics & numerical data KW - Humans KW - Questionnaires/*standards KW - United States AB - Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results. VL - 9 SN - 1529-7713 (Print)1529-7713 (Linking) N1 - Davis, Laurie LaughlinDodd, Barbara GUnited StatesJournal of applied measurementJ Appl Meas. 2008;9(1):1-17. ER - TY - JOUR T1 - Using item banks to construct measures of patient reported outcomes in clinical trials: investigator perceptions JF - Clinical Trials Y1 - 2008 A1 - Flynn, K. E. A1 - Dombeck, C. B. A1 - DeWitt, E. M. A1 - Schulman, K. A. A1 - Weinfurt, K. P. AB - BACKGROUND: Item response theory (IRT) promises more sensitive and efficient measurement of patient-reported outcomes (PROs) than traditional approaches; however, the selection and use of PRO measures from IRT-based item banks differ from current methods of using PRO measures. PURPOSE: To anticipate barriers to the adoption of IRT item banks into clinical trials. METHODS: We conducted semistructured telephone or in-person interviews with 42 clinical researchers who published results from clinical trials in the Journal of the American Medical Association, the New England Journal of Medicine, or other leading clinical journals from July 2005 through May 2006. Interviews included a brief tutorial on IRT item banks. RESULTS: After the tutorial, 39 of 42 participants understood the novel products available from an IRT item bank, namely customized short forms and computerized adaptive testing. Most participants (38/42) thought that item banks could be useful in their clinical trials, but they mentioned several potential barriers to adoption, including economic and logistical constraints, concerns about whether item banks are better than current PRO measures, concerns about how to convince study personnel or statisticians to use item banks, concerns about FDA or sponsor acceptance, and the lack of availability of item banks validated in specific disease populations. LIMITATIONS: Selection bias might have led to more positive responses to the concept of item banks in clinical trials. CONCLUSIONS: Clinical investigators are open to a new method of PRO measurement offered in IRT item banks, but bank developers must address investigator and stakeholder concerns before widespread adoption can be expected. VL - 5 SN - 1740-7745 (Print) N1 - Flynn, Kathryn EDombeck, Carrie BDeWitt, Esi MorganSchulman, Kevin AWeinfurt, Kevin P5U01AR052186/AR/NIAMS NIH HHS/United StatesResearch Support, N.I.H., ExtramuralEnglandClinical trials (London, England)Clin Trials. 2008;5(6):575-86. ER - TY - CHAP T1 - Development of a multiple-component CAT for measuring foreign language proficiency (SIMTEST) Y1 - 2007 A1 - Sumbling, M. A1 - Sanz, P. A1 - Viladrich, M. C. A1 - Doval, E. A1 - Riera, L. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 258 KB} ER - TY - CHAP T1 - Partial order knowledge structures for CAT applications Y1 - 2007 A1 - Desmarais, M. C. A1 - Pu, X, A1 - Blais, J-G. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 475 KB} ER - TY - JOUR T1 - Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale JF - Journal of Applied Measurement Y1 - 2007 A1 - Riley, B. B. A1 - Conrad, K. J. A1 - Bezruczko, N. A1 - Dennis, M. L. KW - My article AB - Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. VL - 8 ER - TY - JOUR T1 - A system for interactive assessment and management in palliative care JF - Journal of Pain Symptom Management Y1 - 2007 A1 - Chang, C-H. A1 - Boni-Saenz, A. A. A1 - Durazo-Arvizu, R. A. A1 - DesHarnais, S. A1 - Lau, D. T. A1 - Emanuel, L. L. KW - *Needs Assessment KW - Humans KW - Medical Informatics/*organization & administration KW - Palliative Care/*organization & administration AB - The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. VL - 33 SN - 0885-3924 (Print) N1 - Chang, Chih-HungBoni-Saenz, Alexander ADurazo-Arvizu, Ramon ADesHarnais, SusanLau, Denys TEmanuel, Linda LR21CA113191/CA/United States NCIResearch Support, N.I.H., ExtramuralReviewUnited StatesJournal of pain and symptom managementJ Pain Symptom Manage. 2007 Jun;33(6):745-55. Epub 2007 Mar 23. ER - TY - CHAP T1 - Use of CAT in dynamic testing Y1 - 2007 A1 - De Beer, M. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 133 KB} ER - TY - Generic T1 - The use of computerized adaptive testing to assess psychopathology using the Global Appraisal of Individual Needs T2 - American Evaluation Association Y1 - 2007 A1 - Conrad, K. J. A1 - Riley, B. B. A1 - Dennis, M. L. JF - American Evaluation Association PB - American Evaluation Association CY - Portland, OR USA ER - TY - JOUR T1 - The comparison among item selection strategies of CAT with multiple-choice items JF - Acta Psychologica Sinica Y1 - 2006 A1 - Hai-qi, D. A1 - De-zhi, C. A1 - Shuliang, D. A1 - Taiping, D. KW - CAT KW - computerized adaptive testing KW - graded response model KW - item selection strategies KW - multiple choice items AB - The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Science Press: China VL - 38 SN - 0439-755X (Print) ER - TY - CHAP T1 - Computer-based testing T2 - Handbook of multimethod measurement in psychology Y1 - 2006 A1 - F Drasgow A1 - Chuah, S. C. KW - Adaptive Testing computerized adaptive testing KW - Computer Assisted Testing KW - Experimentation KW - Psychometrics KW - Theories AB - (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) JF - Handbook of multimethod measurement in psychology PB - American Psychological Association CY - Washington D.C. USA VL - xiv N1 - Using Smart Source ParsingHandbook of multimethod measurement in psychology. (pp. 87-100). Washington, DC : American Psychological Association, [URL:http://www.apa.org/books]. xiv, 553 pp ER - TY - JOUR T1 - Computerized adaptive testing under nonparametric IRT models JF - Psychometrika Y1 - 2006 A1 - Xu, X. A1 - Douglas, J. VL - 71 ER - TY - CHAP T1 - Designing computerized adaptive tests Y1 - 2006 A1 - Davey, T. A1 - Pitoniak, M. J. CY - S.M. Downing and T. M. Haladyna (Eds.), Handbook of test development. New Jersey: Lawrence Erlbaum Associates. ER - TY - JOUR T1 - Expansion of a physical function item bank and development of an abbreviated form for clinical research JF - Journal of Applied Measurement Y1 - 2006 A1 - Bode, R. K. A1 - Lai, J-S. A1 - Dineen, K. A1 - Heinemann, A. W. A1 - Shevrin, D. A1 - Von Roenn, J. A1 - Cella, D. KW - clinical research KW - computerized adaptive testing KW - performance levels KW - physical function item bank KW - Psychometrics KW - test reliability KW - Test Validity AB - We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Richard M Smith: US VL - 7 SN - 1529-7713 (Print) ER - TY - JOUR T1 - How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation JF - Applied Measurement in Education Y1 - 2006 A1 - Chuah, Siang Chee A1 - F Drasgow A1 - Luecht, Richard VL - 19 UR - http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_5 ER - TY - CONF T1 - A variant of the progressive restricted item exposure control procedure in computerized adaptive testing systems based on the 3PL and the partial credit model T2 - Paper presented at the annual meetings of the American Educational Research Association Y1 - 2006 A1 - McClarty, L. K. A1 - Sperling, R. A1 - Dodd, B. G. JF - Paper presented at the annual meetings of the American Educational Research Association CY - San Francisco ER - TY - JOUR T1 - Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. M. A1 - Raczek, A. E. A1 - Coster, W. J. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. KW - *Computer Simulation KW - *Disability Evaluation KW - Adolescent KW - Child KW - Child, Preschool KW - Cross-Sectional Studies KW - Disabled Children/*rehabilitation KW - Female KW - Humans KW - Infant KW - Male KW - Outcome Assessment (Health Care)/*methods KW - Rehabilitation Centers KW - Rehabilitation/*standards KW - Sensitivity and Specificity AB - OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. VL - 86 SN - 0003-9993 (Print) N1 - Haley, Stephen MRaczek, Anastasia ECoster, Wendy JDumas, Helene MFragala-Pinkham, Maria AK02 hd45354-01a1/hd/nichdR43 hd42388-01/hd/nichdResearch Support, N.I.H., ExtramuralResearch Support, U.S. Gov't, P.H.S.United StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2005 May;86(5):932-9. ER - TY - JOUR T1 - Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory JF - Archives of Physical Medicine and Rehabilitation Y1 - 2005 A1 - Haley, S. A1 - Raczek, A. A1 - Coster, W. A1 - Dumas, H. A1 - Fragalapinkham, M. VL - 86 SN - 00039993 ER - TY - JOUR T1 - A Bayesian student model without hidden nodes and its comparison with item response theory JF - International Journal of Artificial Intelligence in Education Y1 - 2005 A1 - Desmarais, M. C. A1 - Pu, X. KW - Bayesian Student Model KW - computer adaptive testing KW - hidden nodes KW - Item Response Theory AB - The Bayesian framework offers a number of techniques for inferring an individual's knowledge state from evidence of mastery of concepts or skills. A typical application where such a technique can be useful is Computer Adaptive Testing (CAT). A Bayesian modeling scheme, POKS, is proposed and compared to the traditional Item Response Theory (IRT), which has been the prevalent CAT approach for the last three decades. POKS is based on the theory of knowledge spaces and constructs item-to-item graph structures without hidden nodes. It aims to offer an effective knowledge assessment method with an efficient algorithm for learning the graph structure from data. We review the different Bayesian approaches to modeling student ability assessment and discuss how POKS relates to them. The performance of POKS is compared to the IRT two parameter logistic model. Experimental results over a 34 item Unix test and a 160 item French language test show that both approaches can classify examinees as master or non-master effectively and efficiently, with relatively comparable performance. However, more significant differences are found in favor of POKS for a second task that consists in predicting individual question item outcome. Implications of these results for adaptive testing and student modeling are discussed, as well as the limitations and advantages of POKS, namely the issue of integrating concepts into its structure. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - IOS Press: Netherlands VL - 15 SN - 1560-4292 (Print); 1560-4306 (Electronic) ER - TY - JOUR T1 - Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, J. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. A1 - Shieh, Y. Y. VL - 29 ER - TY - JOUR T1 - Computerized Adaptive Testing With the Partial Credit Model: Estimation Procedures, Population Distributions, and Item Pool Characteristics JF - Applied Psychological Measurement Y1 - 2005 A1 - Gorin, Joanna S. A1 - Dodd, Barbara G. A1 - Fitzpatrick, Steven J. A1 - Shieh, Yann Yann AB -

The primary purpose of this research is to examine the impact of estimation methods, actual latent trait distributions, and item pool characteristics on the performance of a simulated computerized adaptive testing (CAT) system. In this study, three estimation procedures are compared for accuracy of estimation: maximum likelihood estimation (MLE), expected a priori (EAP), and Warm's weighted likelihood estimation (WLE). Some research has shown that MLE and EAP perform equally well under certain conditions in polytomous CAT systems, such that they match the actual latent trait distribution. However, little research has compared these methods when prior estimates of. distributions are extremely poor. In general, it appears that MLE, EAP, and WLE procedures perform equally well when using an optimal item pool. However, the use of EAP procedures may be advantageous under nonoptimal testing conditions when the item pool is not appropriately matched to the examinees.

VL - 29 UR - http://apm.sagepub.com/content/29/6/433.abstract ER - TY - JOUR T1 - An item bank was created to improve the measurement of cancer-related fatigue JF - Journal of Clinical Epidemiology Y1 - 2005 A1 - Lai, J-S. A1 - Cella, D. A1 - Dineen, K. A1 - Bode, R. A1 - Von Roenn, J. A1 - Gershon, R. C. A1 - Shevrin, D. KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Fatigue/*etiology/psychology KW - Female KW - Humans KW - Male KW - Middle Aged KW - Neoplasms/*complications/psychology KW - Psychometrics KW - Questionnaires AB - OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. VL - 58 SN - 0895-4356 (Print)0895-4356 (Linking) N1 - Lai, Jin-SheiCella, DavidDineen, KellyBode, RitaVon Roenn, JamieGershon, Richard CShevrin, DanielEnglandJ Clin Epidemiol. 2005 Feb;58(2):190-7. ER - TY - JOUR T1 - [Item characteristic curve equating under graded response models in IRT] JF - Acta Psychologica Sinica Y1 - 2005 A1 - Jun, Z. A1 - Dongming, O. A1 - Shuyuan, X. A1 - Haiqi, D. A1 - Shuqing, Q. KW - graded response models KW - item characteristic curve KW - Item Response Theory AB - In one of the largest qualificatory tests--economist test, to guarantee the comparability among different years, construct item bank and prepare for computerized adaptive testing, item characteristic curve equating and anchor test equating design under graded models in IRT are used, which have realized the item and ability parameter equating of test data in five years and succeeded in establishing an item bank. Based on it, cut scores of different years are compared by equating and provide demonstrational gist to constitute the eligibility standard of economist test. PB - Science Press: China VL - 37 SN - 0439-755X (Print) ER - TY - JOUR T1 - An item response theory-based pain item bank can enhance measurement precision JF - Journal of Pain and Symptom Management Y1 - 2005 A1 - Lai, J-S. A1 - Dineen, K. A1 - Reeve, B. B. A1 - Von Roenn, J. A1 - Shervin, D. A1 - McGuire, M. A1 - Bode, R. K. A1 - Paice, J. A1 - Cella, D. KW - computerized adaptive testing AB - Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. VL - 30 N1 - 0885-3924Journal Article ER - TY - ABST T1 - Strategies for controlling item exposure in computerized adaptive testing with the partial credit model Y1 - 2005 A1 - Davis, L. L. A1 - Dodd, B. CY - Pearson Educational Measurement Research Report 05-01 ER - TY - JOUR T1 - Test construction for cognitive diagnosis JF - Applied Psychological Measurement Y1 - 2005 A1 - Henson, R. K. A1 - Douglas, J. KW - (Measurement) KW - Cognitive Assessment KW - Item Analysis (Statistical) KW - Profiles KW - Test Construction KW - Test Interpretation KW - Test Items AB - Although cognitive diagnostic models (CDMs) can be useful in the analysis and interpretation of existing tests, little has been developed to specify how one might construct a good test using aspects of the CDMs. This article discusses the derivation of a general CDM index based on Kullback-Leibler information that will serve as a measure of how informative an item is for the classification of examinees. The effectiveness of the index is examined for items calibrated using the deterministic input noisy "and" gate model (DINA) and the reparameterized unified model (RUM) by implementing a simple heuristic to construct a test from an item bank. When compared to randomly constructed tests from the same item bank, the heuristic shows significant improvement in classification rates. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire JF - Alcoholism: Clinical & Experimental Research Y1 - 2005 A1 - Kahler, C. W. A1 - Strong, D. R. A1 - Read, J. P. A1 - De Boeck, P. A1 - Wilson, M. A1 - Acton, G. S. A1 - Palfai, T. P. A1 - Wood, M. D. A1 - Mehta, P. D. A1 - Neale, M. C. A1 - Flay, B. R. A1 - Conklin, C. A. A1 - Clayton, R. R. A1 - Tiffany, S. T. A1 - Shiffman, S. A1 - Krueger, R. F. A1 - Nichol, P. E. A1 - Hicks, B. M. A1 - Markon, K. E. A1 - Patrick, C. J. A1 - Iacono, William G. A1 - McGue, Matt A1 - Langenbucher, J. W. A1 - Labouvie, E. A1 - Martin, C. S. A1 - Sanjuan, P. M. A1 - Bavly, L. A1 - Kirisci, L. A1 - Chung, T. A1 - Vanyukov, M. A1 - Dunn, M. A1 - Tarter, R. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. KW - Psychometrics KW - Substance-Related Disorders AB - Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association VL - 29 N1 - MiscellaneousArticleMiscellaneous Article ER - TY - JOUR T1 - A computerized adaptive knowledge test as an assessment tool in general practice: a pilot study JF - Medical Teacher Y1 - 2004 A1 - Roex, A. A1 - Degryse, J. KW - *Computer Systems KW - Algorithms KW - Educational Measurement/*methods KW - Family Practice/*education KW - Humans KW - Pilot Projects AB - Advantageous to assessment in many fields, CAT (computerized adaptive testing) use in general practice has been scarce. In adapting CAT to general practice, the basic assumptions of item response theory and the case specificity must be taken into account. In this context, this study first evaluated the feasibility of converting written extended matching tests into CAT. Second, it questioned the content validity of CAT. A stratified sample of students was invited to participate in the pilot study. The items used in this test, together with their parameters, originated from the written test. The detailed test paths of the students were retained and analysed thoroughly. Using the predefined pass-fail standard, one student failed the test. There was a positive correlation between the number of items and the candidate's ability level. The majority of students were presented with questions in seven of the 10 existing domains. Although proved to be a feasible test format, CAT cannot substitute for the existing high-stakes large-scale written test. It may provide a reliable instrument for identifying candidates who are at risk of failing in the written test. VL - 26 N1 - 0142-159xJournal Article ER - TY - CONF T1 - Item parameter recovery with adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2004 A1 - Do, B.-R. A1 - Chuah, S. C. A1 - F Drasgow JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego CA N1 - #DO04-01 {PDF file, 379 KB} ER - TY - JOUR T1 - Strategies for Controlling Item Exposure in Computerized Adaptive Testing With the Generalized Partial Credit Model JF - Applied Psychological Measurement Y1 - 2004 A1 - Davis, Laurie Laughlin AB -

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration.

VL - 28 UR - http://apm.sagepub.com/content/28/3/165.abstract ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2004 A1 - Davis, L. L. KW - computerized adaptive testing KW - generalized partial credit model KW - item exposure AB - Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 28 SN - 0146-6216 (Print) ER - TY - CHAP T1 - Assessing question banks T2 - Reusing online resources: A sustanable approach to e-learning Y1 - 2003 A1 - Bull, J. A1 - Dalziel, J. A1 - Vreeland, T. KW - Computer Assisted Testing KW - Curriculum Based Assessment KW - Education KW - Technology computerized adaptive testing AB - In Chapter 14, Joanna Bull and James Daziel provide a comprehensive treatment of the issues surrounding the use of Question Banks and Computer Assisted Assessment, and provide a number of excellent examples of implementations. In their review of the technologies employed in Computer Assisted Assessment the authors include Computer Adaptive Testing and data generation. The authors reveal significant issues involving the impact of Intellectual Property rights and computer assisted assessment and make important suggestions for strategies to overcome these obstacles. (PsycINFO Database Record (c) 2005 APA )http://www-jime.open.ac.uk/2003/1/ (journal abstract) JF - Reusing online resources: A sustanable approach to e-learning PB - Kogan Page Ltd. CY - London, UK ER - TY - JOUR T1 - Can an item response theory-based pain item bank enhance measurement precision? JF - Clinical Therapeutics Y1 - 2003 A1 - Lai, J-S. A1 - Dineen, K. A1 - Cella, D. A1 - Von Roenn, J. VL - 25 JO - Clin Ther ER - TY - CONF T1 - A comparison of exposure control procedures in CAT systems based on different measurement models for testlets using the verbal reasoning section of the MCAT T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Boyd, A. M A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 405 KB} ER - TY - CONF T1 - A comparison of item exposure control procedures using a CAT system based on the generalized partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Burt, W. M A1 - Kim, S.-J A1 - Davis, L. L. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 265 KB} ER - TY - CONF T1 - A comparison of learning potential results at various educational levels T2 - Paper presented at the 6th Annual Society for Industrial and Organisational Psychology of South Africa (SIOPSA) conference Y1 - 2003 A1 - De Beer, M. JF - Paper presented at the 6th Annual Society for Industrial and Organisational Psychology of South Africa (SIOPSA) conference CY - 25-27 June 2003 N1 - {PDF file, 391 KB} ER - TY - JOUR T1 - A computer adaptive testing simulation applied to the FIM instrument motor component JF - Arch Phys Med Rehabil Y1 - 2003 A1 - Dijkers, M.P. VL - 84 ER - TY - CONF T1 - Development of the Learning Potential Computerised Adaptive Test (LPCAT) T2 - Unpublished manuscript. Y1 - 2003 A1 - De Beer, M. JF - Unpublished manuscript. N1 - {PDF file, 563 KB} ER - TY - JOUR T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model JF - Journal of Applied Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. J. KW - *Computers KW - *Educational Measurement KW - *Models, Theoretical KW - Automation KW - Decision Making KW - Humans KW - Reproducibility of Results AB - The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. VL - 4 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Item exposure constraints for testlets in the verbal reasoning section of the MCAT JF - Applied Psychological Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Dodd, B. G. KW - Adaptive Testing KW - Computer Assisted Testing KW - Entrance Examinations KW - Item Response Theory KW - Random Sampling KW - Reasoning KW - Verbal Ability computerized adaptive testing AB - The current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - CONF T1 - Maintaining scale in computer adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Smith, R. L. A1 - Rizavi, S. A1 - Paez, R. A1 - Damiano, M. A1 - Herbert, E. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 367 KB} ER - TY - CONF T1 - Predicting item exposure parameters in computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2003 A1 - Chen, S-Y. A1 - Doong, H. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - {PDF file, 239 KB} ER - TY - CONF T1 - A simulation study to compare CAT strategies for cognitive diagnosis T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Xu, X. A1 - Chang, Hua-Hua A1 - Douglas, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 250 KB} ER - TY - CONF T1 - Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Davis, L. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - PDF file, 620 K ER - TY - JOUR T1 - Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items JF - Dissertation Abstracts International: Section B: The Sciences & Engineering Y1 - 2003 A1 - Davis, L. L. AB - Choosing a strategy for controlling the exposure of items to examinees has become an integral part of test development for computerized adaptive testing (CAT). Item exposure can be controlled through the use of a variety of algorithms which modify the CAT item selection process. This may be done through a randomization, conditional selection, or stratification approach. The effectiveness of each procedure as well as the degree to which measurement precision is sacrificed has been extensively studied with dichotomously scored item pools. However, only recently have researchers begun to examine these procedures in polytomously scored item pools. The current study investigated the performance of six different exposure control mechanisms under three polytomous IRT models in terms of measurement precision, test security, and ease of implementation. The three models examined in the current study were the partial credit, generalized partial credit, and graded response models. In addition to a no exposure control baseline condition, the randomesque, within .10 logits, Sympson-Hetter, conditional Sympson-Hetter, a-Stratified, and enhanced a-Stratified procedures were implemented to control item exposure rates. The a-Stratified and enhanced a-Stratified procedures were not evaluated with the partial credit model. Two variations of the randomesque and within .10 logits procedures were also examined which varied the size of the item group from which the next item to be administered was randomly selected. The results of this study were remarkably similar for all three models and indicated that the randomesque and within .10 logits procedures, when implemented with the six item group variation, provide the best option for controlling exposure rates when impact to measurement precision and ease of implementation are considered. The three item group variations of the procedures were, however, ineffective in controlling exposure, overlap, and pool utilization rates to desired levels. The Sympson-Hetter and conditional Sympson-Hetter procedures were difficult and time consuming to implement, and while they did control exposure rates to the target level, their performance in terms of item overlap (for the Sympson-Hetter) and pool utilization were disappointing. The a-Stratified and enhanced a-Stratified procedures both turned in surprisingly poor performances across all variables. (PsycINFO Database Record (c) 2004 APA, all rights reserved). VL - 64 ER - TY - JOUR T1 - A study of the feasibility of Internet administration of a computerized health survey: The Headache Impact Test (HIT) JF - Quality of Life Research Y1 - 2003 A1 - Bayliss, M.S. A1 - Dewey, J.E. A1 - Dunlap, I A1 - et. al. VL - 12 ER - TY - CONF T1 - To stratify or not: An investigation of CAT item selection procedures under practical constraints T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Deng, H. A1 - Ansley, T. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - Chicago IL N1 - {PDF file, 186 KB} ER - TY - JOUR T1 - A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2002 A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chang, Hua-Hua KW - (Statistical) KW - Adaptive Testing KW - Algorithms computerized adaptive testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Mathematical Modeling AB - The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 26 ER - TY - CHAP T1 - Controlling item exposure and maintaining item security Y1 - 2002 A1 - Davey, T. A1 - Nering, M. CY - C. N. Mills, M. T. Potenza, and J. J. Fremer (Eds.), Computer-Based Testing: Building the Foundation for Future Assessments (pp. 165-191). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. ER - TY - JOUR T1 - An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model JF - Applied Psychological Measurement Y1 - 2002 A1 - Stark, S. A1 - F Drasgow KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Response Theory KW - Maximum Likelihood KW - Personnel Evaluation KW - Statistical Correlation KW - Statistical Estimation AB - Borman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. VL - 26 ER - TY - JOUR T1 - Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice JF - Quality of Life Research Y1 - 2002 A1 - Davis, K. M. A1 - Chang, C-H. A1 - Lai, J-S. A1 - Cella, D. VL - 11(7) ER - TY - CHAP T1 - Innovative item types for computerized testing Y1 - 2002 A1 - Parshall, C. G. A1 - Davey, T. A1 - Pashley, P. CY - In W. J. van der Linden and C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Norwell MA: Kluwer (in press). ER - TY - BOOK T1 - Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items Y1 - 2002 A1 - Davis, L. L. CY - Unpublished doctoral dissertation, University of Texas, Austin N1 - {PDF file, 1.83 MB} ER - TY - CONF T1 - Utility of Learning Potential Computerised Adaptive Test (LPCAT) scores in predicting academic performance of bridging students: A comparison with other predictors T2 - Paper presented at the 5th Annual Society for Industrial and Organisational Psychology Congress Y1 - 2002 A1 - De Beer, M. JF - Paper presented at the 5th Annual Society for Industrial and Organisational Psychology Congress CY - Pretoria, South Africa ER - TY - CHAP T1 - The work ahead: A psychometric infrastructure for computerized adaptive tests T2 - Computer-based tests: Building the foundation for future assessment Y1 - 2002 A1 - F Drasgow ED - M. P. Potenza ED - J. J. Freemer ED - W. C. Ward KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Measurement KW - Psychometrics AB - (From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA ) JF - Computer-based tests: Building the foundation for future assessment PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingComputer-based testing: Building the foundation for future assessments. (pp. 1-35). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xi, 326 pp ER - TY - CONF T1 - a-stratified computerized adaptive testing with unequal item exposure across strata T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Deng, H. A1 - Chang, Hua-Hua JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #DE01-01 ER - TY - ABST T1 - An examination of testlet scoring and item exposure constraints in the verbal reasoning section of the MCAT Y1 - 2001 A1 - Davis, L. L. A1 - Dodd, B. G. CY - MCAT Monograph Series: Association of American Medical Colleges ER - TY - CONF T1 - An examination of testlet scoring and item exposure constraints in the Verbal Reasoning section of the MCAT Y1 - 2001 A1 - Davis, L. L. A1 - Dodd, B. G. N1 - {PDF file, 653 KB} ER - TY - JOUR T1 - An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales JF - Journal of Applied Psychology Y1 - 2001 A1 - Borman, W. C. A1 - Buck, D. E. A1 - Hanson, M. A. A1 - Motowidlo, S. J. A1 - Stark, S. A1 - F Drasgow KW - *Computer Simulation KW - *Employee Performance Appraisal KW - *Personnel Selection KW - Adult KW - Automatic Data Processing KW - Female KW - Human KW - Male KW - Reproducibility of Results KW - Sensitivity and Specificity KW - Support, U.S. Gov't, Non-P.H.S. KW - Task Performance and Analysis KW - Video Recording AB - This laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results. VL - 86 N1 - 214803450021-9010Journal ArticleValidation Studies ER - TY - CONF T1 - An investigation of the impact of items that exhibit mild DIF on ability estimation in CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Jennings, J. A. A1 - Dodd, B. G. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA ER - TY - ABST T1 - Item and passage selection algorithm simulations for a computerized adaptive version of the verbal section of the Medical College Admission Test (MCAT) Y1 - 2001 A1 - Smith, R. W. A1 - Plake, B. S. A1 - De Ayala, R. J., CY - MCAT Monograph Series ER - TY - JOUR T1 - Algoritmo mixto mínima entropía-máxima información para la selección de ítems en un test adaptativo informatizado JF - Psicothema Y1 - 2000 A1 - Dorronsoro, J. R. A1 - Santa-Cruz, C. A1 - Rubio Franco, V. J. A1 - Aguado García, D. KW - computerized adaptive testing AB - El objetivo del estudio que presentamos es comparar la eficacia como estrat egia de selección de ítems de tres algo ritmos dife rentes: a) basado en máxima info rmación; b) basado en mínima entropía; y c) mixto mínima entropía en los ítems iniciales y máxima info rmación en el resto; bajo la hipótesis de que el algo ritmo mixto, puede dotar al TAI de mayor eficacia. Las simulaciones de procesos TAI se re a l i z a ron sobre un banco de 28 ítems de respuesta graduada calibrado según el modelo de Samejima, tomando como respuesta al TAI la respuesta ori ginal de los sujetos que fueron utilizados para la c a l i b ración. Los resultados iniciales mu e s t ran cómo el cri t e rio mixto es más eficaz que cualquiera de los otros dos tomados indep e n d i e n t e m e n t e. Dicha eficacia se maximiza cuando el algo ritmo de mínima entropía se re s t ri n ge a la selección de los pri m e ros ítems del TAI, ya que con las respuestas a estos pri m e ros ítems la estimación de q comienza a ser re l evante y el algo ritmo de máxima informaciónse optimiza.Item selection algo rithms in computeri zed adap t ive testing. The aim of this paper is to compare the efficacy of three different item selection algo rithms in computeri zed adap t ive testing (CAT). These algorithms are based as follows: the first one is based on Item Info rm ation, the second one on Entropy, and the last algo rithm is a mixture of the two previous ones. The CAT process was simulated using an emotional adjustment item bank. This item bank contains 28 graded items in six categories , calibrated using Samejima (1969) Graded Response Model. The initial results show that the mixed criterium algorithm performs better than the other ones. VL - 12 ER - TY - CONF T1 - Applying specific information item selection to a passage-based test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA, April ER - TY - BOOK T1 - Computerized adaptive testing: A primer (2nd edition) Y1 - 2000 A1 - Wainer, H., A1 - Dorans, N. A1 - Eignor, D. R. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale, N. J. : Lawrence Erlbaum Associates ER - TY - BOOK T1 - The construction and evaluation of a dynamic computerised adaptive test for the measurement of learning potential Y1 - 2000 A1 - De Beer, M. CY - Unpublished D. Litt et Phil dissertation. University of South Africa, Pretoria. ER - TY - CONF T1 - An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2000 A1 - Davis, L. L. A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chiang, C. A1 - Fitzpatrick, S. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans, LA ER - TY - JOUR T1 - Item selection algorithms in computerized adaptive testing JF - Psicothema Y1 - 2000 A1 - Garcia, David A. A1 - Santa Cruz, C. A1 - Dorronsoro, J. R. A1 - Rubio Franco, V. J. AB - Studied the efficacy of 3 different item selection algorithms in computerized adaptive testing. Ss were 395 university students (aged 20-25 yrs) in Spain. Ss were asked to submit answers via computer to 28 items of a personality questionnaire using item selection algorithms based on maximum item information, entropy, or mixed item-entropy algorithms. The results were evaluated according to ability of Ss to use item selection algorithms and number of questions. Initial results indicate that mixed criteria algorithms were more efficient than information or entropy algorithms for up to 15 questionnaire items, but that differences in efficiency decreased with increasing item number. Implications for developing computer adaptive testing methods are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 12 N1 - Spanish .Algoritmo mixto minima entropia-maxima informacion para la seleccion de items en un test adaptativo informatizado..Universidad de Oviedo, Spain ER - TY - BOOK T1 - Learning Potential Computerised Adaptive Test (LPCAT): Technical Manual Y1 - 2000 A1 - De Beer, M. CY - Pretoria: UNISA N1 - #deBE00-01 ER - TY - BOOK T1 - Learning Potential Computerised Adaptive Test (LPCAT): User's Manual Y1 - 2000 A1 - De Beer, M. CY - Pretoria: UNISA N1 - #deBE00-02 ER - TY - CONF T1 - Specific information item selection for adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Davey, T. A1 - Fan, M. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans ER - TY - CHAP T1 - Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing Y1 - 2000 A1 - Wainer, H., A1 - Bradlow, E. T. A1 - Du, Z. CY - W. J. van der Linden and C. A. W. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 245-270). Norwell MA: Kluwer. ER - TY - CONF T1 - Computerized testing – Issues and applications (Mini-course manual) T2 - Annual Meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Parshall, C. A1 - Davey, T. A1 - Spray, J. A1 - Kalohn, J. JF - Annual Meeting of the National Council on Measurement in Education CY - Montreal ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National council on Measurement in Education Y1 - 1999 A1 - Fan, M. A1 - Thompson, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National council on Measurement in Education CY - Montreal N1 - #FA99-01 ER - TY - BOOK T1 - Innovations in computerized assessment Y1 - 1999 A1 - F Drasgow A1 - Olson-Buchanan, J. B. KW - computerized adaptive testing AB - Chapters in this book present the challenges and dilemmas faced by researchers as they created new computerized assessments, focusing on issues addressed in developing, scoring, and administering the assessments. Chapters are: (1) "Beyond Bells and Whistles; An Introduction to Computerized Assessment" (Julie B. Olson-Buchanan and Fritz Drasgow); (2) "The Development of a Computerized Selection System for Computer Programmers in a Financial Services Company" (Michael J. Zickar, Randall C. Overton, L. Rogers Taylor, and Harvey J. Harms); (3) "Development of the Computerized Adaptive Testing Version of the Armed Services Vocational Aptitude Battery" (Daniel O. Segall and Kathleen E. Moreno); (4) "CAT for Certification and Licensure" (Betty A. Bergstrom and Mary E. Lunz); (5) "Developing Computerized Adaptive Tests for School Children" (G. Gage Kingsbury and Ronald L. Houser); (6) "Development and Introduction of a Computer Adaptive Graduate Record Examinations General Test" (Craig N. Mills); (7) "Computer Assessment Using Visual Stimuli: A Test of Dermatological Skin Disorders" (Terry A. Ackerman, John Evans, Kwang-Seon Park, Claudia Tamassia, and Ronna Turner); (8) "Creating Computerized Adaptive Tests of Music Aptitude: Problems, Solutions, and Future Directions" (Walter P. Vispoel); (9) "Development of an Interactive Video Assessment: Trials and Tribulations" (Fritz Drasgow, Julie B. Olson-Buchanan, and Philip J. Moberg); (10) "Computerized Assessment of Skill for a Highly Technical Job" (Mary Ann Hanson, Walter C. Borman, Henry J. Mogilka, Carol Manning, and Jerry W. Hedge); (11) "Easing the Implementation of Behavioral Testing through Computerization" (Wayne A. Burroughs, Janet Murray, S. Scott Wesley, Debra R. Medina, Stacy L. Penn, Steven R. Gordon, and Michael Catello); and (12) "Blood, Sweat, and Tears: Some Final Comments on Computerized Assessment." (Fritz Drasgow and Julie B. Olson-Buchanan). Each chapter contains references. (Contains 17 tables and 21 figures.) (SLD) PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. N1 - EDRS Availability: None. Lawrence Erlbaum Associates, Inc., Publishers, 10 Industrial Avenue, Mahwah, New Jersey 07430-2262 (paperback: ISBN-0-8058-2877-X, $29.95; clothbound: ISBN-0-8058-2876-1, $59.95). Tel: 800-926-6579 (Toll Free). ER - TY - CONF T1 - Performance of the Sympson-Hetter exposure control algorithm with a polytomous item bank T2 - Paper presented at the annual meeting of American Educational Research Association Y1 - 1999 A1 - Pastor, D. A. A1 - Chiang, C. A1 - Dodd, B. G. A1 - Yockey, R. and JF - Paper presented at the annual meeting of American Educational Research Association CY - Montreal, Canada ER - TY - CONF T1 - Pretesting alongside an operational CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Davey, T. A1 - Pommerich, M A1 - Thompson, D. T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Principles for administering adaptive tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Miller, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal Canada ER - TY - CHAP T1 - Research and development of a computer-adaptive test of listening comprehension in the less-commonly taught language Hausa Y1 - 1999 A1 - Dunkel, P. CY - M. Chalhoub-Deville (Ed.). Issues in computer-adaptive testing of reading proficiency. Cambridge, UK : Cambridge University Press. ER - TY - CHAP T1 - Alternatives for scoring computerized adaptive tests T2 - Computer-based testing Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. ED - J. J. Fremer ED - W. C. Ward JF - Computer-based testing PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J., USA ER - TY - CONF T1 - Alternatives for scoring computerized adaptive tests T2 - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments Y1 - 1998 A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. JF - Paper presented at an Educational Testing Service-sponsored colloquium entitled Computer-based testing: Building the foundations for future assessments CY - Philadelphia PA ER - TY - CONF T1 - Application of an IRT ideal point model to computer adaptive assessment of job performance T2 - Paper presented at the annual meeting of the Society for Industrial and Organization Psychology Y1 - 1998 A1 - Stark, S. A1 - F Drasgow JF - Paper presented at the annual meeting of the Society for Industrial and Organization Psychology CY - Dallas TX ER - TY - JOUR T1 - A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model JF - Educational and Psychological Measurement Y1 - 1998 A1 - Chen, S. A1 - Hou, L. A1 - Dodd, B. G. VL - 58 ER - TY - CONF T1 - Computerized adaptive rating scales that measure contextual performance T2 - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology Y1 - 1998 A1 - Borman, W. C. A1 - Hanson, M. A. A1 - Montowidlo, S. J A1 - F Drasgow A1 - Foster, L A1 - Kubisiak, U. C. JF - Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology CY - Dallas TX ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CONF T1 - Constructing passage-based tests that parallel conventional programs T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Controlling item exposure and maintaining item security T2 - Paper presented at an Educational Testing Service-sponsored colloquium entitled “Computer-based testing: Building the foundations for future assessments Y1 - 1998 A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at an Educational Testing Service-sponsored colloquium entitled “Computer-based testing: Building the foundations for future assessments CY - ” Philadelphia PA ER - TY - CONF T1 - Evaluating and insuring measurement precision in adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Test development exposure control for adaptive testing T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Parshall, C. G. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego, CA ER - TY - JOUR T1 - Computer-adaptive testing of listening comprehension: A blueprint of CAT Development JF - The Language Teacher Online 21 Y1 - 1997 A1 - Dunkel, P. A. VL - no. 10. N1 - . ER - TY - JOUR T1 - Developing and scoring an innovative computerized writing assessment JF - Journal of Educational Measurement Y1 - 1997 A1 - Davey, T. A1 - Godwin, J., A1 - Mittelholz, D. VL - 34 ER - TY - JOUR T1 - The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational & Psychological Measurement Y1 - 1997 A1 - Chen, S-K. A1 - Hou, L. Y. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. KW - computerized adaptive testing AB - Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 57 N1 - Sage Publications, US ER - TY - JOUR T1 - The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model JF - Educational and Psychological Measurement Y1 - 1997 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. VL - 57 ER - TY - CONF T1 - Realistic simulation procedures for item response data T2 - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society Y1 - 1997 A1 - Davey, T. A1 - Nering, M. A1 - Thompson, T. JF - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society CY - Gatlinburg TN N1 - Symposium presented at the annual meeting of the Psychometric Society, Gatlinburg TN. ER - TY - CONF T1 - Simulation of realistic ability vectors T2 - Paper presented at the Psychometric Society meeting Y1 - 1997 A1 - Nering, M. A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the Psychometric Society meeting CY - Gatlinburg TN ER - TY - CHAP T1 - Adaptive assessment and training using the neighbourhood of knowledge states Y1 - 1996 A1 - Dowling, C. E. A1 - Hockemeyer, C. A1 - Ludwig, A .H. CY - Frasson, C. and Gauthier, G. and Lesgold, A. (eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montral, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 578-587. ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Davey, T. A1 - Thomas, L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - CONF T1 - Person-fit indices and their role in the CAT environment T2 - Paper presented at the Annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - David, L. A. A1 - Lewis, C. JF - Paper presented at the Annual meeting of the National Council on Measurement in Education CY - New York NY ER - TY - ABST T1 - Preliminary cost-effectiveness analysis of alternative ASVAB testing concepts at MET sites Y1 - 1996 A1 - Hogan, P.F. A1 - Dall, T. A1 - J. R. McBride CY - Interim report to Defense Manpower Data Center. Fairfax, VA: Lewin-VHI, Inc. ER - TY - JOUR T1 - Validity of item selection: A comparison of automated computerized adaptive and manual paper and pencil examinations JF - Teaching and Learning in Medicine Y1 - 1996 A1 - Lunz, M. E. A1 - Deville, C. W. VL - 8 ER - TY - JOUR T1 - Computerized Adaptive Testing With Polytomous Items JF - Applied Psychological Measurement Y1 - 1995 A1 - Dodd, B. G. A1 - De Ayala, R. J. A1 - Koch. W.R., VL - 19 IS - 1 ER - TY - JOUR T1 - Computerized adaptive testing with polytomous items JF - Applied Psychological Measurement Y1 - 1995 A1 - Dodd, B. G. A1 - De Ayala, R. J., A1 - Koch, W. R. AB - Discusses polytomous item response theory models and the research that has been conducted to investigate a variety of possible operational procedures (item bank, item selection, trait estimation, stopping rule) for polytomous model-based computerized adaptive testing (PCAT). Studies are reviewed that compared PCAT systems based on competing item response theory models that are appropriate for the same measurement objective, as well as applications of PCAT in marketing and educational psychology. Directions for future research using PCAT are suggested. VL - 19 ER - TY - CONF T1 - The effect of population distribution and methods of theta estimation on CAT using the rating scale model T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Chen, S. A1 - Hou, L. A1 - Fitzpatrick, S. J. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco ER - TY - ABST T1 - The introduction and comparability of the computer-adaptive GRE General Test (GRE Board Professional Report 88-08ap; Educational Testing Service Research Report 95-20) Y1 - 1995 A1 - Schaeffer, G. A. A1 - Steffen, M. Golub-Smith, M. L. A1 - Mills, C. N. A1 - Durso, R. CY - Princeton NJ: Educational Testing Service ER - TY - JOUR T1 - An investigation of procedures for computerized adaptive testing using the successive intervals Rasch model JF - Educational and Psychological Measurement Y1 - 1995 A1 - Koch, W. R. A1 - Dodd, B. G. VL - 55 ER - TY - CONF T1 - New algorithms for item selection and exposure control with computerized adaptive testing T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1995 A1 - Davey, T. A1 - Parshall, C. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - San Francisco CA ER - TY - CHAP T1 - Prerequisite relationships for the adaptive assessment of knowledge Y1 - 1995 A1 - Dowling, C. E. A1 - Kaluscha, R. CY - Greer, J. (Ed.) Proceedings of AIED'95, 7th World Conference on Artificial Intelligence in Education, Washington, DC, AACE 43-50. ER - TY - CONF T1 - Some new methods for content balancing adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Segall, D. O. A1 - Davey, T. C. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - JOUR T1 - Computer adaptive testing: Assessment of the future JF - Curriculum/Technology Quarterly Y1 - 1994 A1 - Diones, R. A1 - Everson, H. VL - 4 (2) ER - TY - ABST T1 - Computerized mastery testing using fuzzy set decision theory (Research Report 94-37) Y1 - 1994 A1 - Du, Y. A1 - Lewis, C. A1 - Pashley, P. J. CY - Princeton NJ: Educational Testing Service ER - TY - BOOK T1 - A comparison of computer adaptive test administration methods Y1 - 1993 A1 - Dolan, S. CY - Unpublished doctoral dissertation, University of Chicago ER - TY - CONF T1 - Computerized adaptive testing in computer science: assessing student programming abilities T2 - Proceedings of the twenty-fourth SIGCSE Technical Symposium on Computer Science Education Y1 - 1993 A1 - Syang, A. A1 - Dale, N.B. JF - Proceedings of the twenty-fourth SIGCSE Technical Symposium on Computer Science Education CY - Indianapolis IN ER - TY - JOUR T1 - Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules JF - Educational and Psychological Measurement Y1 - 1993 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., AB - Simulated datasets were used to research the effects of the systematic variation of three major variables on the performance of computerized adaptive testing (CAT) procedures for the partial credit model. The three variables studied were the stopping rule for terminating the CATs, item pool size, and the distribution of the difficulty of the items in the pool. Results indicated that the standard error stopping rule performed better across the variety of CAT conditions than the minimum information stopping rule. In addition it was found that item pools that consisted of as few as 30 items were adequate for CAT provided that the item pool was of medium difficulty. The implications of these findings for implementing CAT systems based on the partial credit model are discussed. VL - 53 ER - TY - JOUR T1 - Computerized mastery testing using fuzzy set decision theory JF - Applied Measurement in Education Y1 - 1993 A1 - Du, Y. A1 - Lewis, C. A1 - Pashley, P. J. VL - 6 N1 - (Also Educational Testing Service Research Report 94-37) ER - TY - JOUR T1 - Moving in a new direction: Computerized adaptive testing (CAT) JF - Nursing Management Y1 - 1993 A1 - Jones-Dickson, C. A1 - Dorsey, D. A1 - Campbell-Warnock, J. A1 - Fields, F. KW - *Computers KW - Accreditation/methods KW - Educational Measurement/*methods KW - Licensure, Nursing KW - United States VL - 24 SN - 0744-6314 (Print) N1 - Jones-Dickson, CDorsey, DCampbell-Warnock, JFields, FUnited statesNursing managementNurs Manage. 1993 Jan;24(1):80, 82. ER - TY - JOUR T1 - The application of latent class models in adaptive testing JF - Psychometrika Y1 - 1992 A1 - Macready, G. B. A1 - Dayton, C. M. VL - 57 ER - TY - JOUR T1 - A comparison of the partial credit and graded response models in computerized adaptive testing JF - Applied Measurement in Education Y1 - 1992 A1 - De Ayala, R. J. A1 - Dodd, B. G. A1 - Koch, W. R. VL - 5 ER - TY - CONF T1 - How review options and administration mode influence scores on computerized vocabulary tests T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1992 A1 - Vispoel, W. P. A1 - Wang, T. A1 - De la Torre, R. A1 - Bleiler, T. A1 - Dings, J. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Francisco CA N1 - #VI92-01 ER - TY - JOUR T1 - The influence of dimensionality on CAT ability estimation JF - Educational and Psychological Measurement Y1 - 1992 A1 - De Ayala, R. J., VL - 52 ER - TY - JOUR T1 - The Nominal Response Model in Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 1992 A1 - De Ayals, R.J. VL - 15 ER - TY - JOUR T1 - The nominal response model in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1992 A1 - De Ayala, R. J., VL - 16 ER - TY - ABST T1 - An analysis of CAT-ASVAB scores in the Marine Corps JPM data (CRM- 91-161) Y1 - 1991 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analysis ER - TY - ABST T1 - Collected works on the legal aspects of computerized adaptive testing Y1 - 1991 A1 - Stenson, H. A1 - Graves, P. A1 - Gardiner, J. A1 - Dally, L. CY - Chicago, IL: National Council of State Boards of Nursing, Inc ER - TY - CONF T1 - The development and evaluation of a computerized adaptive testing system T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1991 A1 - De la Torre, R. A1 - Vispoel, W. P. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago IL N1 - ERIC No. ED 338 711) ER - TY - JOUR T1 - Computerized adaptive measurement of attitudes JF - Measurement and Evaluation in Counseling and Development Y1 - 1990 A1 - Koch, W. R. A1 - Dodd, B. G. A1 - Fitzpatrick, S. J. VL - 23 ER - TY - BOOK T1 - Computerized adaptive testing: A primer (Eds.) Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale NJ: Erlbaum ER - TY - JOUR T1 - The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model JF - Applied Psychological Measurement Y1 - 1990 A1 - Dodd, B. G. AB - Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of poly- chotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT 0 estimates with full scale estimates and known 0 values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: VL - 14 ER - TY - JOUR T1 - The Effect of Item Selection Procedure and Stepsize on Computerized Adaptive Attitude Measurement Using the Rating Scale Model JF - Applied Psychological Measurement Y1 - 1990 A1 - Dodd, B. G. VL - 14 IS - 4 ER - TY - CHAP T1 - Future challenges Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. ER - TY - JOUR T1 - A simulation and comparison of flexilevel and Bayesian computerized adaptive testing JF - Journal of Educational Measurement Y1 - 1990 A1 - De Ayala, R. J., A1 - Dodd, B. G. A1 - Koch, W. R. KW - computerized adaptive testing AB - Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT. VL - 27 ER - TY - JOUR T1 - A comparison of the nominal response model and the three-parameter logistic model in computerized adaptive testing JF - Educational and Psychological Measurement Y1 - 1989 A1 - De Ayala, R. J., VL - 49 ER - TY - JOUR T1 - Estimating Reliabilities of Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 1989 A1 - Divgi, D. R. VL - 13 IS - 2 ER - TY - JOUR T1 - An investigation of procedures for computerized adaptive testing using partial credit scoring JF - Applied Measurement in Education Y1 - 1989 A1 - Koch, W. R. A1 - Dodd, B. G. VL - 2 ER - TY - JOUR T1 - Operational characteristics of adaptive testing procedures using the graded response model JF - Applied Psychological Measurement Y1 - 1989 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., VL - 13 ER - TY - JOUR T1 - Operational Characteristics of Adaptive Testing Procedures Using the Graded Response Model JF - Applied Psychological Measurement Y1 - 1989 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J. VL - 13 IS - 2 ER - TY - CONF T1 - Computerized adaptive attitude measurement: A comparison of the graded response and rating scale models T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1988 A1 - Dodd, B. G. A1 - Koch, W. R. A1 - De Ayala, R. J., JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans ER - TY - JOUR T1 - Computerized adaptive testing: A comparison of the nominal response model and the three parameter model JF - Dissertation Abstracts International Y1 - 1988 A1 - De Ayala, R. J., KW - computerized adaptive testing VL - 48 ER - TY - BOOK T1 - Computerized adaptive testing: The state of the art in assessment at three community colleges Y1 - 1988 A1 - Doucette, D. CY - Laguna Hills CA: League for Innovation in the Community College ER - TY - CONF T1 - Computerized adaptive testing: A comparison of the nominal response model and the three-parameter logistic model T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1987 A1 - De Ayala, R. J., A1 - Koch, W. R. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Washington DC ER - TY - CONF T1 - Computerized adaptive testing with the rating scale model T2 - Paper presented at the Fourth International Objective Measurement Workshop Y1 - 1987 A1 - Dodd, B. G. JF - Paper presented at the Fourth International Objective Measurement Workshop CY - Chicago ER - TY - ABST T1 - Properties of some Bayesian scoring procedures for computerized adaptive tests (Research Memorandum CRM 87-161) Y1 - 1987 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analyses ER - TY - ABST T1 - Determining the sensitivity of CAT-ASVAB scores to changes in item response curves with the medium of administration (Report No.86-189) Y1 - 1986 A1 - Divgi, D. R. CY - Alexandria VA: Center for Naval Analyses N1 - #DI86-189 ER - TY - JOUR T1 - ALPHATAB: A lookup table for Bayesian computerized adaptive testing JF - Applied Psychological Measurement Y1 - 1985 A1 - De Ayala, R. J., A1 - Koch, W. R. VL - 9 ER - TY - CONF T1 - Computerized adaptive attitude measurement T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1985 A1 - Koch, W. R. A1 - Dodd, B. G. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Chicago ER - TY - JOUR T1 - Implications for altering the context in which test items appear: A historical perspective on an immediate concern JF - Review of Educational Research Y1 - 1985 A1 - Leary, L. F. A1 - Dorans, N. J. VL - 55 ER - TY - JOUR T1 - Item Location Effects and Their Implications for IRT Equating and Adaptive Testing JF - Applied Psychological Measurement Y1 - 1984 A1 - Kingston, N. M. A1 - Dorans, N. J. VL - 8 IS - 2 ER - TY - ABST T1 - Tailored testing, its theory and practice. Part I: The basic model, the normal ogive submodels, and the tailored testing algorithm (NPRDC TR-83-00) Y1 - 1983 A1 - Urry, V. W. A1 - Dorans, N. J. CY - San Diego CA: Navy Personnel Research and Development Center ER - TY - ABST T1 - A comparison of two methods of interactive testing Final report. Y1 - 1981 A1 - Nicewander, W. A. A1 - Chang, H. S. A1 - Doody, E. N. CY - National Institute of Education Grant 79-1045 ER - TY - CONF T1 - Group tailored tests and some problems of their utlization T2 - Third international Symposium on Educational testing Y1 - 1977 A1 - Lewy, A A1 - Doron, R JF - Third international Symposium on Educational testing CY - Leyden, The Netherlands ER - TY - JOUR T1 - Hardware and software evolution of an adaptive ability measurement system JF - Behavior Research Methods and Instrumentation Y1 - 1976 A1 - DeWitt, L. J. A1 - Weiss, D. J. VL - 8 ER - TY - ABST T1 - Best test design and self-tailored testing (Research Memorandum No 19) Y1 - 1975 A1 - Wright, B. D. A1 - Douglas, G. A. CY - Chicago: University of Chicago, Department of Education, Statistical Laboratory. ER - TY - ABST T1 - A computer software system for adaptive ability measurement (Research Report 74-1) Y1 - 1974 A1 - De Witt, J. J. A1 - Weiss, D. J. CY - Minneapolis MN: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory ER - TY - ABST T1 - Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training Y1 - 1974 A1 - Hansen, D. N. A1 - Johnson, B. F. A1 - Fagan, R. L. A1 - Tan, P. A1 - Dick, W. CY - JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. ER -