%0 Journal Article %J Journal of Computerized Adaptive Testing %D 2024 %T The Influence of Computerized Adaptive Testing on Psychometric Theory and Practice %A Reckase, Mark D. %K computerized adaptive testing %K Item Response Theory %K paradigm shift %K scaling theory %K test design %X

The major premise of this article is that part of the stimulus for the evolution of psychometric theory since the 1950s was the introduction of the concept of computerized adaptive testing (CAT) or its earlier non-CAT variations. The conceptual underpinnings of CAT that had the most influence on psychometric theory was the shift of emphasis from the test (or test score) as the focus of analysis to the test item (or item score). The change in focus allowed a change in the way that test results are conceived of as measurements. It also resolved the conflict among a number of ideas that were present in the early work on psychometric theory. Some of the conflicting ideas are summarized below to show how work on the development of CAT resolved some of those conflicts.

 

%B Journal of Computerized Adaptive Testing %V 11 %G English %U https://jcatpub.net/index.php/jcat/issue/view/34/9 %N 1 %R 10.7333/2403-1101001 %0 Journal Article %J Journal of Educational Measurement %D 2020 %T Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items %A Chen, Chia-Wen %A Wang, Wen-Chung %A Chiu, Ming Ming %A Ro, Sage %X Abstract The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. %B Journal of Educational Measurement %V 57 %P 343-369 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12252 %R 10.1111/jedm.12252 %0 Journal Article %J Educational and Psychological Measurement %D 2020 %T The Optimal Item Pool Design in Multistage Computerized Adaptive Tests With the p-Optimality Method %A Lihong Yang %A Mark D. Reckase %X The present study extended the p-optimality method to the multistage computerized adaptive test (MST) context in developing optimal item pools to support different MST panel designs under different test configurations. Using the Rasch model, simulated optimal item pools were generated with and without practical constraints of exposure control. A total number of 72 simulated optimal item pools were generated and evaluated by an overall sample and conditional sample using various statistical measures. Results showed that the optimal item pools built with the p-optimality method provide sufficient measurement accuracy under all simulated MST panel designs. Exposure control affected the item pool size, but not the item distributions and item pool characteristics. This study demonstrated that the p-optimality method can adapt to MST item pool design, facilitate the MST assembly process, and improve its scoring accuracy. %B Educational and Psychological Measurement %V 80 %P 955-974 %U https://doi.org/10.1177/0013164419901292 %R 10.1177/0013164419901292 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2019 %T How Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive? %A Mark Reckase %A Unhee Ju %A Sewon Kim %K computerized adaptive test %K multistage test %K statistical indicators of amount of adaptation %B Journal of Computerized Adaptive Testing %V 7 %P 1-14 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/69/34 %N 1 %R 10.7333/1902-0701001 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model %A Audrey J. Leroux %A J. Kay Waid-Ebbs %A Pey-Shan Wen %A Drew A. Helmer %A David P. Graham %A Maureen K. O’Connor %A Kathleen Ray %X The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. The two variable-length stopping rules included were the SE and predicted standard error reduction (PSER), along with three item pools of varied sizes (43, 86, and 172 items). Descriptive statistics on number of nonconvergent cases, measurement precision, testing burden, item overlap, item exposure, and pool utilization were calculated. Results revealed that the PSER stopping rule administered fewer items on average while maintaining measurement precision similar to the SE stopping rule across the different item pool sizes and exposure controls. The PR-SE exposure control procedure surpassed the randomesque methods by further reducing test overlap, maintaining maximum exposure rates at the target rate or lower, and utilizing all items from the pool with a minimal increase in number of items administered and nonconvergent cases. %B Applied Psychological Measurement %V 43 %P 624-638 %U https://doi.org/10.1177/0146621618824856 %R 10.1177/0146621618824856 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments %A Svetina, Dubravka %A Liaw, Yuan-Ling %A Rutkowski, Leslie %A Rutkowski, David %X Abstract This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s) and whether routing choices (optimal versus suboptimal routing) have an impact on achievement precision. Additionally, we examine the impact of testlet length on both person and item recovery. Overall, our results suggest that no single approach works best across the studied conditions. With respect to the mean person parameter recovery, IRT scoring (via either Fisher information or preliminary EAP estimates) outperformed classical NC methods, although differences in bias and root mean squared error were generally small. Item exposure rates were found to be more evenly distributed when suboptimal routing methods were used, and item recovery (both difficulty and discrimination) was most precisely observed for items with moderate difficulties. Based on the results of the simulation study, we draw conclusions and discuss implications for practice in the context of international large-scale assessments that recently introduced adaptive assessment in the form of MST. Future research directions are also discussed. %B Journal of Educational Measurement %V 56 %P 192-213 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12206 %R 10.1111/jedm.12206 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Constructing Shadow Tests in Variable-Length Adaptive Testing %A Qi Diao %A Hao Ren %X Imposing content constraints is very important in most operational computerized adaptive testing (CAT) programs in educational measurement. Shadow test approach to CAT (Shadow CAT) offers an elegant solution to imposing statistical and nonstatistical constraints by projecting future consequences of item selection. The original form of Shadow CAT presumes fixed test lengths. The goal of the current study was to extend Shadow CAT to tests under variable-length termination conditions and evaluate its performance relative to other content balancing approaches. The study demonstrated the feasibility of constructing Shadow CAT with variable test lengths and in operational CAT programs. The results indicated the superiority of the approach compared with other content balancing methods. %B Applied Psychological Measurement %V 42 %P 538-552 %U https://doi.org/10.1177/0146621617753736 %R 10.1177/0146621617753736 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T What Information Works Best?: A Comparison of Routing Methods %A Halil Ibrahim Sari %A Anthony Raborn %X There are many item selection methods proposed for computerized adaptive testing (CAT) applications. However, not all of them have been used in computerized multistage testing (ca-MST). This study uses some item selection methods as a routing method in ca-MST framework. These are maximum Fisher information (MFI), maximum likelihood weighted information (MLWI), maximum posterior weighted information (MPWI), Kullback–Leibler (KL), and posterior Kullback–Leibler (KLP). The main purpose of this study is to examine the performance of these methods when they are used as a routing method in ca-MST applications. These five information methods under four ca-MST panel designs and two test lengths (30 items and 60 items) were tested using the parameters of a real item bank. Results were evaluated with overall findings (mean bias, root mean square error, correlation between true and estimated thetas, and module exposure rates) and conditional findings (conditional absolute bias, standard error of measurement, and root mean square error). It was found that test length affected the outcomes much more than other study conditions. Under 30-item conditions, 1-3 designs outperformed other panel designs. Under 60-item conditions, 1-3-3 designs were better than other panel designs. Each routing method performed well under particular conditions; there was no clear best method in the studied conditions. The recommendations for routing methods in any particular condition were provided for researchers and practitioners as well as the limitations of these results. %B Applied Psychological Measurement %V 42 %P 499-515 %U https://doi.org/10.1177/0146621617752990 %R 10.1177/0146621617752990 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Bayesian Perspectives on Adaptive Testing %A Wim J. van der Linden %A Bingnan Jiang %A Hao Ren %A Seung W. Choi %A Qi Diao %K Bayesian Perspective %K CAT %X

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B 2017 IACAT Conference %D 2017 %T How Adaptive is an Adaptive Test: Are all Adaptive Tests Adaptive? %A Mark D Reckase %K Adaptive Testing %K CAT %X

There are many different kinds of adaptive tests but they all have the characteristic that some feature of the test is customized to the purpose of the test. In the time allotted, it is impossible to consider the adaptation of all of this types so this address will focus on the “classic” adaptive test that matches the difficulty of the test to the capabilities of the person being tested. This address will first present information on the maximum level of adaptation that can occur and then compare the amount of adaptation that typically occurs on an operational adaptive test to the maximum level of adaptation. An index is proposed to summarize the amount of adaptation and it is argued that this type of index should be reported for operational adaptive tests to show the amount of adaptation that typically occurs.

Click for Presentation Video 

%B 2017 IACAT Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Nj-zDCKk3DvHA4Jlp1qkb2XovmHeQfxu %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Item Pool Design and Evaluation %A Mark D Reckase %A Wei He %A Jing-Ru Xu %A Xuechun Zhou %K CAT %K Item Pool Design %X

Early work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1ZAsqm1yNZlliqxEHcyyqQ_vOSu20xxZs %0 Journal Article %J Quality of Life Research %D 2017 %T The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study %A Barthel, D. %A Otto, C. %A Nolte, S. %A Meyrose, A.-K. %A Fischer, F. %A Devine, J. %A Walter, O. %A Mierke, A. %A Fischer, K. I. %A Thyen, U. %A Klein, M. %A Ankermann, T. %A Rose, M. %A Ravens-Sieberer, U. %X Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. %B Quality of Life Research %V 26 %P 1105–1117 %8 May %U https://doi.org/10.1007/s11136-016-1437-9 %R 10.1007/s11136-016-1437-9 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2016 %T Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context %A Colvin, Kimberly %A Keller, Lisa A %A Robin, Frederic %K Adaptive Testing %K automatic item generation %K errors in item parameters %K item clones %K multistage testing %B Journal of Computerized Adaptive Testing %V 4 %P 1-18 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/59/27 %N 1 %R 10.7333/1608-040101 %0 Journal Article %J Educational Measurement: Issues and Practice. %D 2016 %T Using Response Time to Detect Item Preknowledge in Computer?Based Licensure Examinations %A Qian H. %A Staniewska, D. %A Reckase, M. %A Woo, A. %X This article addresses the issue of how to detect item preknowledge using item response time data in two computer-based large-scale licensure examinations. Item preknowledge is indicated by an unexpected short response time and a correct response. Two samples were used for detecting item preknowledge for each examination. The first sample was from the early stage of the operational test and was used for item calibration. The second sample was from the late stage of the operational test, which may feature item preknowledge. The purpose of this research was to explore whether there was evidence of item preknowledge and compromised items in the second sample using the parameters estimated from the first sample. The results showed that for one nonadaptive operational examination, two items (of 111) were potentially exposed, and two candidates (of 1,172) showed some indications of preknowledge on multiple items. For another licensure examination that featured computerized adaptive testing, there was no indication of item preknowledge or compromised items. Implications for detected aberrant examinees and compromised items are discussed in the article. %B Educational Measurement: Issues and Practice. %V 35 %N 1 %R http://dx.doi.org/10.1111/emip.12102 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T Utilizing Response Times in Computerized Classification Testing %A Sie, Haskell %A Finkelman, Matthew D. %A Riley, Barth %A Smits, Niels %X A well-known approach in computerized mastery testing is to combine the Sequential Probability Ratio Test (SPRT) stopping rule with item selection to maximize Fisher information at the mastery threshold. This article proposes a new approach in which a time limit is defined for the test and examinees’ response times are considered in both item selection and test termination. Item selection is performed by maximizing Fisher information per time unit, rather than Fisher information itself. The test is terminated once the SPRT makes a classification decision, the time limit is exceeded, or there is no remaining item that has a high enough probability of being answered before the time limit. In a simulation study, the new procedure showed a substantial reduction in average testing time while slightly improving classification accuracy compared with the original method. In addition, the new procedure reduced the percentage of examinees who exceeded the time limit. %B Applied Psychological Measurement %V 39 %P 389-405 %U http://apm.sagepub.com/content/39/5/389.abstract %R 10.1177/0146621615569504 %0 Journal Article %J Educational and Psychological Measurement %D 2014 %T Item Pool Design for an Operational Variable-Length Computerized Adaptive Test %A He, Wei %A Reckase, Mark D. %X

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution and item exposure issues. Yet, there is little research on how to design item pools to have those desirable features. The research reported in this article provided step-by-step hands-on guidance on the item pool design process by applying the bin-and-union method to design item pools for a large-scale licensure CAT employing complex adaptive testing algorithm with variable test length, a decision based on stopping rule, content balancing, and exposure control. The design process involved extensive simulations to identify several alternative item pool designs and evaluate their performance against a series of criteria. The design output included the desired item pool size and item parameter distribution. The results indicate that the mechanism used to identify the desirable item pool features functions well and that two recommended item pool designs would support satisfactory performance of the operational testing program.

%B Educational and Psychological Measurement %V 74 %P 473-494 %U http://epm.sagepub.com/content/74/3/473.abstract %R 10.1177/0013164413509629 %0 Journal Article %J BMC Med Res Methodol %D 2012 %T Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study. %A Riley, Barth B %A Carle, Adam C %K Bayes Theorem %K Data Interpretation, Statistical %K Humans %K Mathematical Computing %K Monte Carlo Method %K Outcome Assessment (Health Care) %X

BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.

METHOD: Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.

RESULTS: Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF.

CONCLUSIONS: Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.

%B BMC Med Res Methodol %V 12 %P 124 %8 2012 %G eng %R 10.1186/1471-2288-12-124 %0 Journal Article %J Journal of Educational Measurement %D 2012 %T Detecting Local Item Dependence in Polytomous Adaptive Data %A Mislevy, Jessica L. %A Rupp, André A. %A Harring, Jeffrey R. %X

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed-form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient-Reported Outcomes Measurement Information System (PROMIS).

%B Journal of Educational Measurement %V 49 %P 127–147 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00165.x %R 10.1111/j.1745-3984.2012.00165.x %0 Journal Article %J Applied Psychological Measurement %D 2011 %T catR: An R Package for Computerized Adaptive Testing %A Magis, D. %A Raîche, G. %K computer program %K computerized adaptive testing %K Estimation %K Item Response Theory %X

Computerized adaptive testing (CAT) is an active current research field in psychometrics and educational measurement. However, there is very little software available to handle such adaptive tasks. The R package catR was developed to perform adaptive testing with as much flexibility as possible, in an attempt to provide a developmental and testing platform to the interested user. Several item-selection rules and ability estimators are implemented. The item bank can be provided by the user or randomly generated from parent distributions of item parameters. Three stopping rules are available. The output can be graphically displayed.

%B Applied Psychological Measurement %G eng %R 10.1177/0146621611407482 %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T Computer adaptive testing for small scale programs and instructional systems %A Rudner, L. M. %A Guo, F. %X

This study investigates measurement decision theory (MDT) as an underlying model for computer adaptive testing when the goal is to classify examinees into one of a finite number of groups. The first analysis compares MDT with a popular item response theory model and finds little difference in terms of the percentage of correct classifications. The second analysis examines the number of examinees needed to calibrate MDT item parameters and finds accurate classifications even with calibration sample sizes as small as 100 examinees.

%B Journal of Applied Testing Technology %V 12 %G English %N 1 %0 Journal Article %J Journal of Personality Assessment %D 2011 %T Computerized adaptive assessment of personality disorder: Introducing the CAT–PD project %A Simms, L. J. %A Goldberg, L .R. %A Roberts, J. E. %A Watson, D. %A Welte, J. %A Rotterman, J. H. %X Assessment of personality disorders (PD) has been hindered by reliance on the problematic categorical model embodied in the most recent Diagnostic and Statistical Model of Mental Disorders (DSM), lack of consensus among alternative dimensional models, and inefficient measurement methods. This article describes the rationale for and early results from a multiyear study funded by the National Institute of Mental Health that was designed to develop an integrative and comprehensive model and efficient measure of PD trait dimensions. To accomplish these goals, we are in the midst of a 5-phase project to develop and validate the model and measure. The results of Phase 1 of the project—which was focused on developing the PD traits to be assessed and the initial item pool—resulted in a candidate list of 59 PD traits and an initial item pool of 2,589 items. Data collection and structural analyses in community and patient samples will inform the ultimate structure of the measure, and computerized adaptive testing will permit efficient measurement of the resultant traits. The resultant Computerized Adaptive Test of Personality Disorder (CAT–PD) will be well positioned as a measure of the proposed DSM–5 PD traits. Implications for both applied and basic personality research are discussed. %B Journal of Personality Assessment %V 93 %P 380-389 %@ 0022-3891 %G eng %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications %A Jacobsen, J. %A Ackermann, R. %A Egüez, J. %A Ganguli, D. %A Rickard, P. %A Taylor, L. %X

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

%B Journal of Applied Testing Technology %V 12 %G English %U http://www.testpublishers.org/journal-of-applied-testing-technology %N 1 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Detecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study %A Barth B. Riley %A Adam C. Carle %K 95% Credible Interval %K CAT %K DIF %K differential item function %K modified robust Z statistic %K Monte Carlo methodologies %X

A comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Generic %D 2011 %T Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger %A Pilkonis, P. A. %A Choi, S. W. %A Reise, S. P. %A Stover, A. M. %A Riley, W. T. %A Cella, D. %B Assessment %@ 1073-1911 %G eng %& June 21, 2011 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T A Test Assembly Model for MST %A Angela Verschoor %A Ingrid Radtke %A Theo Eggen %K CAT %K mst %K multistage testing %K Rasch %K routing %K tif %X

This study is just a short exploration in the matter of optimization of a MST. It is extremely hard or maybe impossible to chart influence of item pool and test specifications on optimization process. Simulations are very helpful in finding an acceptable MST.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T The Use of Decision Trees for Adaptive Item Selection and Score Estimation %A Barth B. Riley %A Rodney Funk %A Michael L. Dennis %A Richard D. Lennox %A Matthew Finkelman %K adaptive item selection %K CAT %K decision tree %X

Conducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.

Conclusions

Decision tree methods were more efficient than CAT

But,...

Conclusions

CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination

Decision Trees select items that best discriminate between groups defined by the total score.

CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A Comparison of Content-Balancing Procedures for Estimating Multiple Clinical Domains in Computerized Adaptive Testing: Relative Precision, Validity, and Detection of Persons With Misfitting Responses %A Barth B. Riley %A Michael L. Dennis %A Conrad, Kendon J. %X

This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target number of items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.

%B Applied Psychological Measurement %V 34 %P 410-423 %U http://apm.sagepub.com/content/34/6/410.abstract %R 10.1177/0146621609349802 %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A comparison of content-balancing procedures for estimating multiple clinical domains in computerized adaptive testing: Relative precision, validity, and detection of persons with misfitting responses %A Riley, B. B. %A Dennis, M. L. %A Conrad, K. J. %X This simulation study sought to compare four different computerized adaptive testing (CAT) content-balancing procedures designed for use in a multidimensional assessment with respect to measurement precision, symptom severity classification, validity of clinical diagnostic recommendations, and sensitivity to atypical responding. The four content-balancing procedures were (a) no content balancing, (b) screener-based, (c) mixed (screener plus content balancing), and (d) full content balancing. In full content balancing and in mixed content balancing following administration of the screener items, item selection was based on (a) whether the target numberof items for the item’s subscale was reached and (b) the item’s information function. Mixed and full content balancing provided the best representation of items from each of the main subscales of the Internal Mental Distress Scale. These procedures also resulted in higher CAT to full-scale correlations for the Trauma and Homicidal/Suicidal Thought subscales and improved detection of atypical responding.Keywords %B Applied Psychological Measurement %V 34 %P 410-423 %@ 0146-62161552-3497 %G eng %0 Journal Article %J Psychological Test and Assessment Modeling %D 2010 %T Designing item pools to optimize the functioning of a computerized adaptive test %A Reckase, M. D. %X Computerized adaptive testing (CAT) is a testing procedure that can result in improved precision for a specified test length or reduced test length with no loss of precision. However, these attractive psychometric features of CATs are only achieved if appropriate test items are available for administration. This set of test items is commonly called an “item pool.” This paper discusses the optimal characteristics for an item pool that will lead to the desired properties for a CAT. Then, a procedure is described for designing the statistical characteristics of the item parameters for an optimal item pool within an item response theory framework. Because true optimality is impractical, methods for achieving practical approximations to optimality are described. The results of this approach are shown for an operational testing program including comparisons to the results from the item pool currently used in that testing program.Key %B Psychological Test and Assessment Modeling %V 52 %P 127-141 %@ 2190-0507 %G eng %0 Journal Article %J Quality of Life Research %D 2010 %T Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms %A Choi, S. %A Reise, S. P. %A Pilkonis, P. A. %A Hays, R. D. %A Cella, D. %B Quality of Life Research %V 19(1) %P 125–136 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2010 %T Features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Blais, J. G. %A Raiche, G. %X Whether paper and pencil or computerized adaptive, tests are usually described by a set of rules managing how they are administered: which item will be first, which should follow any given item, when to administer the last one. This article focus on the latter and looks at the effect of two stopping rules on the estimated sampling distribution of the ability estimate in a CAT: the number of items administered and the a priori determined size of the standard error of the ability estimate. %B Journal of Applied Measurement %7 2010/12/18 %V 11 %P 424-31 %@ 1529-7713 (Print)1529-7713 (Linking) %G eng %M 21164229 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Implementing the Graduate Management Admission Test Computerized Adaptive Test %A Rudner, L. M. %B Elements of Adaptive Testing %P 151-166 %G eng %& 8 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Journal of Applied Measurement %D 2010 %T The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research %A Gershon, R. C. %A Rothrock, N. %A Hanrahan, R. %A Bass, M. %A Cella, D. %X The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. %B Journal of Applied Measurement %V 11 %P 304-314 %@ 1529-7713 %G eng %0 Generic %D 2010 %T Validation of a computer-adaptive test to evaluate generic health-related quality of life %A Rebollo, P. %A Castejon, I. %A Cuervo, J. %A Villa, G. %A Garcia-Cueto, E. %A Diaz-Cuervo, H. %A Zardain, P. C. %A Muniz, J. %A Alonso, J. %X BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. %B Health and Quality of Life Outcomes %7 2010/12/07 %V 8 %P 147 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 21129169 %2 3022567 %0 Book Section %D 2009 %T Applications of CAT in admissions to higher education in Israel: Twenty-two years of experience %A Gafni, N. %A Cohen, Y. %A Roded, K %A Baumer, M %A Moshinsky, A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Assessing the equivalence of Internet-based vs. paper-and-pencil psychometric tests. %A Baumer, M %A Roded, K %A Gafni, N. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Comparison of ability estimation and item selection methods in multidimensional computerized adaptive testing %A Diao, Q. %A Reckase, M. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Educational Measurement %D 2009 %T A conditional exposure control method for multidimensional adaptive testing %A Finkelman, M. %A Nering, M. L. %A Roussos, L. A. %B Journal of Educational Measurement %V 46 %P 84-103 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2009 %T A Conditional Exposure Control Method for Multidimensional Adaptive Testing %A Matthew Finkelman %A Nering, Michael L. %A Roussos, Louis A. %X

In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed for tests employing the unidimensional 3-PL model. The present article explores the issues associated with controlling exposure rates when a multidimensional item response theory (MIRT) model is utilized and exposure rates must be controlled conditional upon ability. This situation is complicated by the exponentially increasing number of possible ability values in multiple dimensions. The article introduces a new procedure, called the generalized Stocking-Lewis method, that controls the exposure rate for students of comparable ability as well as with respect to the overall population. A realistic simulation set compares the new method with three other approaches: Kullback-Leibler information with no exposure control, Kullback-Leibler information with unconditional Sympson-Hetter exposure control, and random item selection.

%B Journal of Educational Measurement %V 46 %P 84–103 %U http://dx.doi.org/10.1111/j.1745-3984.2009.01070.x %R 10.1111/j.1745-3984.2009.01070.x %0 Journal Article %J Journal of Applied Measurement %D 2009 %T Considerations about expected a posteriori estimation in adaptive testing: adaptive a priori, adaptive correction for bias, and adaptive integration interval %A Raiche, G. %A Blais, J. G. %K *Bias (Epidemiology) %K *Computers %K Data Interpretation, Statistical %K Models, Statistical %X In a computerized adaptive test, we would like to obtain an acceptable precision of the proficiency level estimate using an optimal number of items. Unfortunately, decreasing the number of items is accompanied by a certain degree of bias when the true proficiency level differs significantly from the a priori estimate. The authors suggest that it is possible to reduced the bias, and even the standard error of the estimate, by applying to each provisional estimation one or a combination of the following strategies: adaptive correction for bias proposed by Bock and Mislevy (1982), adaptive a priori estimate, and adaptive integration interval. %B Journal of Applied Measurement %7 2009/07/01 %V 10 %P 138-56 %@ 1529-7713 (Print)1529-7713 (Linking) %G eng %M 19564695 %0 Journal Article %J International Journal for Methods in Psychiatric Research %D 2009 %T Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application %A Fliege, H. %A Becker, J. %A Walter, O. B. %A Rose, M. %A Bjorner, J. B. %A Klapp, B. F. %X In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application.The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >/= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)].The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do.The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (c) 2009 John Wiley & Sons, Ltd. %B International Journal for Methods in Psychiatric Research %7 2009/02/06 %V 18 %P 233-236 %8 Feb 4 %@ 1049-8931 (Print) %G Eng %M 19194856 %0 Journal Article %J Journal of Clinical Epidemiology %D 2009 %T An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception %A Kocalevent, R. D. %A Rose, M. %A Becker, J. %A Walter, O. B. %A Fliege, H. %A Bjorner, J. B. %A Kleiber, D. %A Klapp, B. F. %K *Diagnosis, Computer-Assisted %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Confidence Intervals %K Female %K Humans %K Male %K Middle Aged %K Perception %K Quality of Health Care/*standards %K Questionnaires %K Reproducibility of Results %K Sickness Impact Profile %K Stress, Psychological/*diagnosis/psychology %K Treatment Outcome %X OBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. %B Journal of Clinical Epidemiology %7 2008/07/22 %V 62 %P 278-287 %@ 1878-5921 (Electronic)0895-4356 (Linking) %G eng %M 18639439 %0 Book Section %D 2009 %T An examination of decision-theory adaptive testing procedures %A Rudner, L. M. %X This research examined three ways to adaptively select items using decision theory: a traditional decision theory sequential testing approach (expected minimum cost), information gain (modeled after Kullback-Leibler), and a maximum discrimination approach, and then compared them all against an approach using maximum IRT Fisher information. It also examined the use of Wald’s (1947) wellknown sequential probability ratio test, SPRT, as a test termination rule in this context. The minimum cost approach was notably better than the best-case possibility for IRT. Information gain, which is based on entropy and comes from information theory, was almost identical to minimum cost. The simple approach using the item that best discriminates between the two most likely classifications also fared better than IRT, but not as well as information gain or minimum cost. Through Wald’s SPRT, large percentages of examinees can be accurately classified with very few items. With only 25 sequentially selected items, for example, approximately 90% of the simulated NAEP examinees were classified with 86% accuracy. The advantages of the decision theory model are many—the model yields accurate mastery state classifications, can use a small item pool, is simple to implement, requires little pretesting, is applicable to criterion-referenced tests, can be used in diagnostic testing, can be adapted to yield classifications on multiple skills, and should be easy to explain to non-statisticians. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Annual Review of Clinical Psychology %D 2009 %T Item response theory and clinical measurement %A Reise, S. P. %A Waller, N. G. %K *Psychological Theory %K Humans %K Mental Disorders/diagnosis/psychology %K Psychological Tests %K Psychometrics %K Quality of Life %K Questionnaires %X In this review, we examine studies that use item response theory (IRT) to explore the psychometric properties of clinical measures. Next, we consider how IRT has been used in clinical research for: scale linking, computerized adaptive testing, and differential item functioning analysis. Finally, we consider the scale properties of IRT trait scores. We conclude that there are notable differences between cognitive and clinical measures that have relevance for IRT modeling. Future research should be directed toward a better understanding of the metric of the latent trait and the psychological processes that lead to individual differences in item response behaviors. %B Annual Review of Clinical Psychology %7 2008/11/04 %V 5 %P 27-48 %@ 1548-5951 (Electronic) %G eng %M 18976138 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've fallen and I can't get up: can high-ability students recover from early mistakes in CAT? %A Rulison, K., %A Loken, E. %B Applied Psychological Measurement %V 33(2) %P 83-101 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? %A Rulison, Kelly L. %A Loken, Eric %X

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

%B Applied Psychological Measurement %V 33 %P 83-101 %U http://apm.sagepub.com/content/33/2/83.abstract %R 10.1177/0146621608324023 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? %A Rulison, Kelly L. %A Loken, Eric %X

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

%B Applied Psychological Measurement %V 33 %P 83-101 %U http://apm.sagepub.com/content/33/2/83.abstract %R 10.1177/0146621608324023 %0 Journal Article %J Quality of Life Research %D 2009 %T Logistics of collecting patient-reported outcomes (PROs) in clinical practice: an overview and practical examples %A Rose, M. %A Bezjak, A. %X PURPOSE: Interest in collecting patient-reported outcomes (PROs), such as health-related quality of life (HRQOL), health status reports, and patient satisfaction is on the rise and practical aspects of collecting PROs in clinical practice are becoming more important. The purpose of this paper is to draw the attention to a number of issues relevant for a successful integration of PRO measures into the daily work flow of busy clinical settings. METHODS: The paper summarizes the results from a breakout session held at an ISOQOL special topic conference for PRO measures in clinical practice in 2007. RESULTS: Different methodologies of collecting PROs are discussed, and the support needed for each methodology is highlighted. The discussion is illustrated by practical real-life examples from early adaptors who administered paper-pencil, or electronic PRO assessments (ePRO) for more than a decade. The paper also reports about new experiences with more recent technological developments, such as SmartPens and Computer Adaptive Tests (CATs) in daily practice. CONCLUSIONS: Methodological and logistical issues determine the resources needed for a successful integration of PRO measures into daily work flow procedures and influence significantly the usefulness of PRO data for clinical practice. %B Quality of Life Research %7 2009/01/20 %V 18 %P 125-36 %8 Feb %@ 0962-9343 (Print) %G eng %M 19152119 %0 Journal Article %J Journal of Applied Measurement %D 2009 %T a posteriori estimation in adaptive testing: Adaptive a priori, adaptive correction for bias, and adaptive integration interval %A Raîche, G. %A Blais, J-G. %B Journal of Applied Measurement %V 10(2) %G eng %0 Journal Article %J Journal of Rheumatology %D 2009 %T Progress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing %A Fries, J.F. %A Cella, D. %A Rose, M. %A Krishnan, E. %A Bruce, B. %K *Disability Evaluation %K *Outcome Assessment (Health Care) %K Arthritis/diagnosis/*physiopathology %K Health Surveys %K Humans %K Prognosis %K Reproducibility of Results %X OBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length. %B Journal of Rheumatology %7 2009/09/10 %V 36 %P 2061-2066 %8 Sep %@ 0315-162X (Print)0315-162X (Linking) %G eng %M 19738214 %0 Conference Paper %B Joint Meeting on Adolescent Treatment Effectiveness %D 2008 %T Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing %A Dennis, M. L. %A Funk, R. %A Titus, J. %A Riley, B. B. %A Hosman, S. %A Kinne, S. %B Joint Meeting on Adolescent Treatment Effectiveness %C Washington D.C., USA %8 2008 %G eng %( 2008 %) ADDED 1 Aug 2008 %F 205795 %0 Journal Article %J Depression and Anxiety %D 2008 %T Functioning and validity of a computerized adaptive test to measure anxiety (A CAT) %A Becker, J. %A Fliege, H. %A Kocalevent, R. D. %A Bjorner, J. B. %A Rose, M. %A Walter, O. B. %A Klapp, B. F. %X Background: The aim of this study was to evaluate the Computerized Adaptive Test to measure anxiety (A-CAT), a patient-reported outcome questionnaire that uses computerized adaptive testing to measure anxiety. Methods: The A-CAT builds on an item bank of 50 items that has been built using conventional item analyses and item response theory analyses. The A-CAT was administered on Personal Digital Assistants to n=357 patients diagnosed and treated at the department of Psychosomatic Medicine and Psychotherapy, Charité Berlin, Germany. For validation purposes, two subgroups of patients (n=110 and 125) answered the A-CAT along with established anxiety and depression questionnaires. Results: The A-CAT was fast to complete (on average in 2 min, 38 s) and a precise item response theory based CAT score (reliability>.9) could be estimated after 4–41 items. On average, the CAT displayed 6 items (SD=4.2). Convergent validity of the A-CAT was supported by correlations to existing tools (Hospital Anxiety and Depression Scale-A, Beck Anxiety Inventory, Berliner Stimmungs-Fragebogen A/D, and State Trait Anxiety Inventory: r=.56–.66); discriminant validity between diagnostic groups was higher for the A-CAT than for other anxiety measures. Conclusions: The German A-CAT is an efficient, reliable, and valid tool for assessing anxiety in patients suffering from anxiety disorders and other conditions with significant potential for initial assessment and long-term treatment monitoring. Future research directions are to explore content balancing of the item selection algorithm of the CAT, to norm the tool to a healthy sample, and to develop practical cutoff scores. Depression and Anxiety, 2008. © 2008 Wiley-Liss, Inc. %B Depression and Anxiety %V 25 %P E182-E194 %@ 1520-6394 %G eng %0 Book Section %D 2007 %T Adaptive estimators of trait level in adaptive testing: Some proposals %A Raîche, G. %A Blais, J. G. %A Magis, D. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J International Journal of Testing %D 2007 %T A “Rearrangement Procedure” For Scoring Adaptive Tests with Review Options %A Papanastasiou, Elena C. %A Reckase, Mark D. %B International Journal of Testing %V 7 %P 387-407 %U http://www.tandfonline.com/doi/abs/10.1080/15305050701632262 %R 10.1080/15305050701632262 %0 Conference Proceedings %B GMAC Conference on Computerized Adaptive Testing %D 2007 %T Computerized classification testing with composite hypotheses %A Thompson, N. A. %A Ro, S. %K computerized adaptive testing %B GMAC Conference on Computerized Adaptive Testing %I Graduate Management Admissions Council %C St. Paul, MN %G eng %0 Book Section %D 2007 %T Computerized classification testing with composite hypotheses %A Thompson, N. A. %A Ro, S. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Conference Paper %B Paper presented at the international meeting of the Psychometric Society %D 2007 %T Cutscore location and classification accuracy in computerized classification testing %A Ro, S. %A Thompson, N. A. %B Paper presented at the international meeting of the Psychometric Society %C Tokyo, Japan %G eng %0 Book Section %D 2007 %T The design of p-optimal item banks for computerized adaptive tests %A Reckase, M. D. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. {PDF file, 211 KB}. %G eng %0 Book Section %D 2007 %T Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control %A Gu, L. %A Reckase, M. D. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T Developing tailored instruments: item banking and computerized adaptive assessment %A Bjorner, J. B. %A Chang, C-H. %A Thissen, D. %A Reeve, B. B. %K *Health Status %K *Health Status Indicators %K *Mental Health %K *Outcome Assessment (Health Care) %K *Quality of Life %K *Questionnaires %K *Software %K Algorithms %K Factor Analysis, Statistical %K Humans %K Models, Statistical %K Psychometrics %X Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. %B Quality of Life Research %7 2007/05/29 %V 16 %P 95-108 %@ 0962-9343 (Print) %G eng %M 17530450 %0 Journal Article %J Quality of Life Research %D 2007 %T Development and evaluation of a computer adaptive test for “Anxiety” (Anxiety-CAT) %A Walter, O. B. %A Becker, J. %A Bjorner, J. B. %A Fliege, H. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 16 %P 143-155 %G eng %0 Book Section %D 2007 %T Development of a multiple-component CAT for measuring foreign language proficiency (SIMTEST) %A Sumbling, M. %A Sanz, P. %A Viladrich, M. C. %A Doval, E. %A Riera, L. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J International Journal of Web-Based Learning and Teaching Technologies %D 2007 %T Evaluation of computer adaptive testing systems %A Economides, A. A. %A Roupas, C %K computer adaptive testing systems %K examination organizations %K systems evaluation %X Many educational organizations are trying to reduce the cost of the exams, the workload and delay of scoring, and the human errors. Also, they try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use computer adaptive testing (CAT) as the method for large scale testing. This article investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into three dimensions: educational, technical, and economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose a preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B International Journal of Web-Based Learning and Teaching Technologies %I IGI Global: US %V 2 %P 70-87 %@ 1548-1093 (Print); 1548-1107 (Electronic) %G eng %M 2007-04391-004 %0 Book Section %D 2007 %T Implementing the Graduate Management Admission Test® computerized adaptive test %A Rudner, L. M. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Psycho-Oncology %D 2007 %T The initial development of an item bank to assess and screen for psychological distress in cancer patients %A Smith, A. B. %A Rush, R. %A Velikova, G. %A Wall, L. %A Wright, E. P. %A Stark, D. %A Selby, P. %A Sharpe, M. %K 3293 Cancer %K cancer patients %K Distress %K initial development %K Item Response Theory %K Models %K Neoplasms %K Patients %K Psychological %K psychological distress %K Rasch %K Stress %X Psychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) %B Psycho-Oncology %V 16 %P 724-732 %@ 1057-9249 %G English %M 2007-12507-004 %0 Journal Article %J Quality of Life Research %D 2007 %T IRT health outcomes data analysis project: an overview and summary %A Cook, K. F. %A Teal, C. R. %A Bjorner, J. B. %A Cella, D. %A Chang, C-H. %A Crane, P. K. %A Gibbons, L. E. %A Hays, R. D. %A McHorney, C. A. %A Ocepek-Welikson, K. %A Raczek, A. E. %A Teresi, J. A. %A Reeve, B. B. %K *Data Interpretation, Statistical %K *Health Status %K *Quality of Life %K *Questionnaires %K *Software %K Female %K HIV Infections/psychology %K Humans %K Male %K Neoplasms/psychology %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Stress, Psychological %X BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. %B Quality of Life Research %7 2007/03/14 %V 16 %P 121-132 %@ 0962-9343 (Print) %G eng %M 17351824 %0 Journal Article %J Quality of Life Research %D 2007 %T Methodological issues for building item banks and computerized adaptive scales %A Thissen, D. %A Reeve, B. B. %A Bjorner, J. B. %A Chang, C-H. %X Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. %B Quality of Life Research %V 16 %P 109-119, %@ 0962-93431573-2649 %G eng %0 Book Section %D 2007 %T Patient-reported outcomes measurement and computerized adaptive testing: An application of post-hoc simulation to a diagnostic screening instrument %A Immekus, J. C. %A Gibbons, R. D. %A Rush, J. A. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Medical Care %D 2007 %T The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years %A Cella, D. %A Yount, S. %A Rothrock, N. %A Gershon, R. C. %A Cook, K. F. %A Reeve, B. %A Ader, D. %A Fries, J.F. %A Bruce, B. %A Rose, M. %X The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. %B Medical Care %V 45 %P S3-S11 %G eng %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Journal Article %J European Journal of Psychological Assessment %D 2007 %T Psychometric properties of an emotional adjustment measure: An application of the graded response model %A Rubio, V. J. %A Aguado, D. %A Hontangas, P. M. %A Hernández, J. M. %K computerized adaptive tests %K Emotional Adjustment %K Item Response Theory %K Personality Measures %K personnel recruitment %K Psychometrics %K Samejima's graded response model %K test reliability %K validity %X Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B European Journal of Psychological Assessment %I Hogrefe & Huber Publishers GmbH: Germany %V 23 %P 39-46 %@ 1015-5759 (Print) %G eng %M 2007-01587-007 %0 Journal Article %J Journal of Applied Measurement %D 2007 %T Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale %A Riley, B. B. %A Conrad, K. J. %A Bezruczko, N. %A Dennis, M. L. %K My article %X Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. %B Journal of Applied Measurement %V 8 %P 48-65 %G eng %0 Conference Proceedings %B American Evaluation Association %D 2007 %T The use of computerized adaptive testing to assess psychopathology using the Global Appraisal of Individual Needs %A Conrad, K. J. %A Riley, B. B. %A Dennis, M. L. %B American Evaluation Association %I American Evaluation Association %C Portland, OR USA %8 November %G eng %0 Report %D 2006 %T Kernel-smoothed DIF detection procedure for computerized adaptive tests (Computerized testing report 00-08) %A Nandakumar, R. %A Banks, J. C. %A Roussos, L. A. %I Law School Admission Council %C Newton, PA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2006 %T SIMCAT 1.0: A SAS computer program for simulating computer adaptive testing %A Raîche, G. %A Blais, J-G. %K computer adaptive testing %K computer program %K estimated proficiency level %K Monte Carlo methodologies %K Rasch logistic model %X Monte Carlo methodologies are frequently applied to study the sampling distribution of the estimated proficiency level in adaptive testing. These methods eliminate real situational constraints. However, these Monte Carlo methodologies are not currently supported by the available software programs, and when these programs are available, their flexibility is limited. SIMCAT 1.0 is aimed at the simulation of adaptive testing sessions under different adaptive expected a posteriori (EAP) proficiency-level estimation methods (Blais & Raîche, 2005; Raîche & Blais, 2005) based on the one-parameter Rasch logistic model. These methods are all adaptive in the a priori proficiency-level estimation, the proficiency-level estimation bias correction, the integration interval, or a combination of these factors. The use of these adaptive EAP estimation methods diminishes considerably the shrinking, and therefore biasing, effect of the estimated a priori proficiency level encountered when this a priori is fixed at a constant value independently of the computed previous value of the proficiency level. SIMCAT 1.0 also computes empirical and estimated skewness and kurtosis coefficients, such as the standard error, of the estimated proficiency-level sampling distribution. In this way, the program allows one to compare empirical and estimated properties of the estimated proficiency-level sampling distribution under different variations of the EAP estimation method: standard error and bias, like the skewness and kurtosis coefficients. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 60-61 %@ 0146-6216 (Print) %G eng %M 2005-16359-005 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory %A Haley, S. %A Raczek, A. %A Coster, W. %A Dumas, H. %A Fragalapinkham, M. %B Archives of Physical Medicine and Rehabilitation %V 86 %P 932-939 %@ 00039993 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory %A Haley, S. M. %A Raczek, A. E. %A Coster, W. J. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %K *Computer Simulation %K *Disability Evaluation %K Adolescent %K Child %K Child, Preschool %K Cross-Sectional Studies %K Disabled Children/*rehabilitation %K Female %K Humans %K Infant %K Male %K Outcome Assessment (Health Care)/*methods %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %X OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2005/05/17 %V 86 %P 932-9 %8 May %@ 0003-9993 (Print) %G eng %M 15895339 %0 Journal Article %J Evaluation and the Health Professions %D 2005 %T Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer %A Eton, D. T. %A Lai, J. S. %A Cella, D. %A Reeve, B. B. %A Talcott, J. A. %A Clark, J. A. %A McPherson, C. P. %A Litwin, M. S. %A Moinpour, C. M. %K *Quality of Life %K *Questionnaires %K Adult %K Aged %K Data Collection/methods %K Humans %K Intestine, Large/*physiopathology %K Male %K Middle Aged %K Prostatic Neoplasms/*physiopathology %K Psychometrics %K Research Support, Non-U.S. Gov't %K Statistics, Nonparametric %X Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. %B Evaluation and the Health Professions %V 28 %P 142-59 %G eng %M 15851770 %0 Journal Article %J Quality of Life Research %D 2005 %T Development of a computer-adaptive test for depression (D-CAT) %A Fliege, H. %A Becker, J. %A Walter, O. B. %A Bjorner, J. B. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 14 %P 2277–2291 %G eng %0 Journal Article %J Health Services Research %D 2005 %T Dynamic assessment of health outcomes: Time to let the CAT out of the bag? %A Cook, K. F. %A O'Malley, K. J. %A Roddey, T. S. %K computer adaptive testing %K Item Response Theory %K self reported health outcomes %X Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Health Services Research %I Blackwell Publishing: United Kingdom %V 40 %P 1694-1711 %@ 0017-9124 (Print); 1475-6773 (Electronic) %G eng %M 2006-02162-008 %0 Book Section %D 2005 %T Features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Blais, J-G. %A Raîche, G. %C D. G. Englehard (Eds.), Objective measurement: Theory into practice. Volume 6. %G eng %0 Journal Article %J Journal of Pain and Symptom Management %D 2005 %T An item response theory-based pain item bank can enhance measurement precision %A Lai, J-S. %A Dineen, K. %A Reeve, B. B. %A Von Roenn, J. %A Shervin, D. %A McGuire, M. %A Bode, R. K. %A Paice, J. %A Cella, D. %K computerized adaptive testing %X Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. %B Journal of Pain and Symptom Management %V 30 %P 278-88 %G eng %M 16183012 %0 Journal Article %J Psicothema %D 2005 %T Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] %A Aguado, D. %A Rubio, V. J. %A Hontangas, P. M. %A Hernández, J. M. %K Computer Assisted Testing %K Emotional Adjustment %K Item Response %K Personality Measures %K Psychometrics %K Test Validity %K Theory %X En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. %B Psicothema %V 17 %P 484-491 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Book Section %B Evidence-based educational methods %D 2004 %T Adaptive computerized educational systems: A case study %A Ray, R. D. %E R. W. Malott %K Artificial %K Computer Assisted Instruction %K Computer Software %K Higher Education %K Individualized %K Instruction %K Intelligence %K Internet %K Undergraduate Education %X (Created by APA) Adaptive instruction describes adjustments typical of one-on-one tutoring as discussed in the college tutorial scenario. So computerized adaptive instruction refers to the use of computer software--almost always incorporating artificially intelligent services--which has been designed to adjust both the presentation of information and the form of questioning to meet the current needs of an individual learner. This chapter describes a system for Internet-delivered adaptive instruction. The author attempts to demonstrate a sharp difference between the teaching that takes place outside of the classroom in universities and the kind that is at least afforded, if not taken advantage of by many, students in a more personalized educational setting such as those in the small liberal arts colleges. The author describes a computer-based technology that allows that gap to be bridged with the advantage of at least having more highly prepared learners sitting in college classrooms. A limited range of emerging research that supports that proposition is cited. (PsycINFO Database Record (c) 2005 APA ) %B Evidence-based educational methods %S Educational Psychology Series %I Elsevier Academic Press %C San Diego, CA. USA %P 143-169 %G eng %& 10 %0 Journal Article %J European Journal of Psychological Assessment %D 2004 %T Assisted self-adapted testing: A comparative study %A Hontangas, P. %A Olea, J. %A Ponsoda, V. %A Revuelta, J. %A Wise, S. L. %K Adaptive Testing %K Anxiety %K Computer Assisted Testing %K Psychometrics %K Test %X A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B European Journal of Psychological Assessment %V 20 %P 2-9 %G eng %0 Journal Article %J Medical Teacher %D 2004 %T A computerized adaptive knowledge test as an assessment tool in general practice: a pilot study %A Roex, A. %A Degryse, J. %K *Computer Systems %K Algorithms %K Educational Measurement/*methods %K Family Practice/*education %K Humans %K Pilot Projects %X Advantageous to assessment in many fields, CAT (computerized adaptive testing) use in general practice has been scarce. In adapting CAT to general practice, the basic assumptions of item response theory and the case specificity must be taken into account. In this context, this study first evaluated the feasibility of converting written extended matching tests into CAT. Second, it questioned the content validity of CAT. A stratified sample of students was invited to participate in the pilot study. The items used in this test, together with their parameters, originated from the written test. The detailed test paths of the students were retained and analysed thoroughly. Using the predefined pass-fail standard, one student failed the test. There was a positive correlation between the number of items and the candidate's ability level. The majority of students were presented with questions in seven of the 10 existing domains. Although proved to be a feasible test format, CAT cannot substitute for the existing high-stakes large-scale written test. It may provide a reliable instrument for identifying candidates who are at risk of failing in the written test. %B Medical Teacher %V 26 %P 178-83 %8 Mar %G eng %M 15203528 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Estimating ability and item-selection strategy in self-adapted testing: A latent class approach %A Revuelta, J. %K estimating ability %K item-selection strategies %K psychometric model %K self-adapted testing %X This article presents a psychometric model for estimating ability and item-selection strategies in self-adapted testing. In contrast to computer adaptive testing, in self-adapted testing the examinees are allowed to select the difficulty of the items. The item-selection strategy is defined as the distribution of difficulty conditional on the responses given to previous items. The article shows that missing responses in self-adapted testing are missing at random and can be ignored in the estimation of ability. However, the item-selection strategy cannot always be ignored in such an estimation. An EM algorithm is presented to estimate an examinee's ability and strategies, and a model fit is evaluated using Akaike's information criterion. The article includes an application with real data to illustrate how the model can be used in practice for evaluating hypotheses, estimating ability, and identifying strategies. In the example, four strategies were identified and related to examinees' ability. It was shown that individual examinees tended not to follow a consistent strategy throughout the test. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational and Behavioral Statistics %I American Educational Research Assn: US %V 29 %P 379-396 %@ 1076-9986 (Print) %G eng %M 2005-00264-002 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Evaluation of the CATSIB DIF procedure in a pretest setting %A Nandakumar, R. %A Roussos, L. A. %K computerized adaptive tests %K differential item functioning %X A new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type I error inflation by employing a CAT version of the SIBTEST "regression correction." The performance of CATSIB in terms of detection of DIF in pretest items was evaluated in a simulation study. Simulated test takers were adoptively administered 25 operational items from a pool of 1,000 and were linearly administered 16 pretest items that were evaluated for DIF. Sample size varied from 250 to 500 in each group. Simulated impact levels ranged from a 0- to 1-standard-deviation difference in mean ability levels. The results showed that CATSIB with the regression correction displayed good control over Type 1 error, whereas CATSIB without the regression correction displayed impact-induced Type 1 error inflation. With 500 test takers in each group, power rates were exceptionally high (84% to 99%) for values of DIF at the boundary between moderate and large DIF. For smaller samples of 250 test takers in each group, the corresponding power rates ranged from 47% to 95%. In addition, in all cases, CATSIB was very accurate in estimating the true values of DIF, displaying at most only minor estimation bias. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational and Behavioral Statistics %I American Educational Research Assn: US %V 29 %P 177-199 %@ 1076-9986 (Print) %G eng %M 2004-19188-002 %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2004 %T Siette: a web-based tool for adaptive testing %A Conejo, R %A Guzmán, E %A Millán, E %A Trella, M %A Pérez-De-La-Cruz, JL %A Ríos, A %K computerized adaptive testing %B International Journal of Artificial Intelligence in Education %V 14 %P 29-61 %G eng %0 Journal Article %J Quality of Life Research %D 2004 %T Validating the German computerized adaptive test for anxiety on healthy sample (A-CAT) %A Becker, J. %A Walter, O. B. %A Fliege, H. %A Bjorner, J. B. %A Kocalevent, R. D. %A Schmid, G. %A Klapp, B. F. %A Rose, M. %B Quality of Life Research %V 13 %P 1515 %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Comparison of multi-stage tests with computer adaptive and paper and pencil tests %A Rotou, O. %A Patsula, L. %A Steffen, M. %A Rizavi, S. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Medical Care (in press) %D 2003 %T Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function (FLEX-SF) %A Cook, K. F. %A Roddey, T. S. %A Gartsman, G M %A Olson, S L %B Medical Care (in press) %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Evaluating a new approach to detect aberrant responses in CAT %A Lu, Y., %A Robin, F. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating computerized adaptive testing design for the MCAT with realistic simulated data %A Lu, Y., %A Pitoniak, M. %A Rizavi, S. %A Way, W. D. %A Steffan, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Journal of Technology, Learning, and Assessment %D 2003 %T A feasibility study of on-the-fly item generation in adaptive testing %A Bejar, I. I. %A Lawless, R. R., %A Morley, M. E., %A Wagner, M. E., %A Bennett R. E., %A Revuelta, J. %B Journal of Technology, Learning, and Assessment %V 2 %G eng %N 3 %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Item pool design for computerized adaptive tests %A Reckase, M. D. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Maintaining scale in computer adaptive testing %A Smith, R. L. %A Rizavi, S. %A Paez, R. %A Damiano, M. %A Herbert, E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 2003 %T A method to determine targets for multi-stage adaptive tests %A Armstrong, R. D. %A Roussos, L. %C Unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Methods for item set selection in adaptive testing %A Lu, Y., %A Rizavi, S. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Psychometric and Psychological Effects of Item Selection and Review on Computerized Testing %A Revuelta, Javier %A Ximénez, M. Carmen %A Olea, Julio %X

Psychometric properties of computerized testing, together with anxiety and comfort of examinees, are investigated in relation to item selection routine and the opportunity for response review. Two different hypotheses involving examinee anxiety were used to design test properties: perceived control and perceived performance. The study involved three types of administration of a computerized English test for Spanish speakers (adaptive, easy adaptive, and fixed) and four review conditions (no review, review at end, review by blocks of 5 items, and review item-by-item). These were applied to a sample of 557 first-year psychology undergraduate students to examine main and interaction effects of test type and review on psychometric and psychological variables. Statistically significant effects were found in test precision among the different types of test. Response review improved ability estimates and increased testing time. No psychological effects on anxiety were found. Examinees in all review conditions considered more important the possibility of review than those who were not allowed to review. These results concur with previous findings on examinees' preference for item review and raise some issues that should be addressed in the field of tests with item review.

%B Educational and Psychological Measurement %V 63 %P 791-808 %U http://epm.sagepub.com/content/63/5/791.abstract %R 10.1177/0013164403251282 %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Psychometric and psychological effects of item selection and review on computerized testing %A Revuelta, J. %A Ximénez, M. C. %A Olea, J. %B Educational and Psychological Measurement %V 63 %P 791-808 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates %A Swaminathan, H. %A Hambleton, R. K. %A Sireci, S. G. %A Xing, D. %A Rizavi, S. M. %X Large item banks with properly calibrated test items are essential for ensuring the validity of computer-based tests. At the same time, item calibrations with small samples are desirable to minimize the amount of pretesting and limit item exposure. Bayesian estimation procedures show considerable promise with small examinee samples. The purposes of the study were (a) to examine how prior information for Bayesian item parameter estimation can be specified and (b) to investigate the relationship between sample size and the specification of prior information on the accuracy of item parameter estimates. The results of the simulation study were clear: Estimation of item response theory (IRT) model item parameters can be improved considerably. Improvements in the one-parameter model were modest; considerable improvements with the two- and three-parameter models were observed. Both the study of different forms of priors and ways to improve the judgmental data used in forming the priors appear to be promising directions for future research. %B Applied Psychological Measurement %V 27 %P 27-51 %G eng %0 Journal Article %J Zeitschrift für Differentielle und Diagnostische Psychologie %D 2003 %T Timing behavior in computerized adaptive testing: Response times for correct and incorrect answers are not related to general fluid intelligence/Zum Zeitverhalten beim computergestützten adaptiveb Testen: Antwortlatenzen bei richtigen und falschen Lösun %A Rammsayer, Thomas %A Brandler, Susanne %K Adaptive Testing %K Cognitive Ability %K Intelligence %K Perception %K Reaction Time computerized adaptive testing %X Examined the effects of general fluid intelligence on item response times for correct and false responses in computerized adaptive testing. After performing the CFT3 intelligence test, 80 individuals (aged 17-44 yrs) completed perceptual and cognitive discrimination tasks. Results show that response times were related neither to the proficiency dimension reflected by the task nor to the individual level of fluid intelligence. Furthermore, the false > correct-phenomenon as well as substantial positive correlations between item response times for false and correct responses were shown to be independent of intelligence levels. (PsycINFO Database Record (c) 2005 APA ) %B Zeitschrift für Differentielle und Diagnostische Psychologie %V 24 %P 57-63 %G eng %0 Journal Article %J Drug and Alcohol Dependence %D 2002 %T Assessing tobacco beliefs among youth using item response theory models %A Panter, A. T. %A Reeve, B. B. %K *Attitude to Health %K *Culture %K *Health Behavior %K *Questionnaires %K Adolescent %K Adult %K Child %K Female %K Humans %K Male %K Models, Statistical %K Smoking/*epidemiology %X Successful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed. %B Drug and Alcohol Dependence %V 68 %P S21-S39 %8 Nov %G eng %M 12324173 %0 Journal Article %J Dissertation Abstracts International Section A: Humanities & Social Sciences %D 2002 %T The effect of test characteristics on aberrant response patterns in computer adaptive testing %A Rizavi, S. M. %K computerized adaptive testing %X The advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering different number of items according to the stopping rule employed. Unfortunately, the realities of testing conditions have necessitated the imposition of time and minimum test length limits on CATs. Such constraints might place a burden on the CAT test taker resulting in aberrant response behaviors by some examinees. Occurrence of such response patterns results in inaccurate estimation of examinee proficiency levels. This study examined the effects of test lengths, time limits and the interaction of these factors with the examinee proficiency levels on the occurrence of aberrant response patterns. The focus of the study was on the aberrant behaviors caused by rushed guessing due to restrictive time limits. Four different testing scenarios were examined; fixed length performance tests with and without content constraints, fixed length mastery tests and variable length mastery tests without content constraints. For each of these testing scenarios, the effect of two test lengths, five different timing conditions and the interaction between these factors with three ability levels on ability estimation were examined. For fixed and variable length mastery tests, decision accuracy was also looked at in addition to the estimation accuracy. Several indices were used to evaluate the estimation and decision accuracy for different testing conditions. The results showed that changing time limits had a significant impact on the occurrence of aberrant response patterns conditional on ability. Increasing test length had negligible if not negative effect on ability estimation when rushed guessing occured. In case of performance testing high ability examinees while in classification testing middle ability examinees suffered the most. The decision accuracy was considerably affected in case of variable length classification tests. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International Section A: Humanities & Social Sciences %V 62 %P 3363 %G eng %0 Journal Article %J Mesure et évaluation en éducation %D 2002 %T Étude de la distribution d'échantillonnage de l'estimateur du niveau d'habileté en testing adaptatif en fonction de deux règles d'arrêt dans le contexte de l'application du modèle de Rasch [Study of the sampling distribution of the proficiecy estima %A Raîche, G. %A Blais, J-G. %B Mesure et évaluation en éducation %V 24(2-3) %P 23-40 %G French %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %K computerized adaptive testing %X In the present study, a procedure that has been used to select dichotomous items in computerized adaptive testing was applied to polytomous items. This procedure was designed to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. Substantial differences between the two item selection procedures were not found when computerized adaptive tests were evaluated on bias and the root mean square of the ability estimate. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An examination of decision-theory adaptive testing procedures %A Rudner, L. M. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Generic %D 2002 %T A feasibility study of on-the-fly item generation in adaptive testing (GRE Board Report No 98-12) %A Bejar, I. I. %A Lawless, R. R %A Morley, M. E %A Wagner, M. E. %A Bennett, R. E. %A Revuelta, J. %C Educational Testing Service RR02-23. Princeton NJ: Educational Testing Service. Note = “{PDF file, 193 KB} %0 Journal Article %J Mesure et évaluation en éducation. %D 2002 %T La simulation d’un test adaptatif basé sur le modèle de Rasch [Simulation of a Rasch-based adaptive test] %A Raîche, G. %B Mesure et évaluation en éducation. %G French %0 Book Section %D 2002 %T Le testing adaptatif [Adaptive testing] %A Raîche, G. %C D. R. Bertrand and J.G. Blais (Eds) : Les théories modernes de la mesure [Modern theories of measurement]. Sainte-Foy: Presses de l’Université du Québec. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %D 2002 %T Practical considerations about expected a posteriori estimation in adaptive testing: Adaptive a prior, adaptive corrections for bias, adaptive integration interval %A Raiche, G. %A Blais, J. G. %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %C New Orleans, LA %G eng %0 Conference Paper %B Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. %D 2002 %T Some features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Raîche, G. %A Blais, J. G. %B Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %D 2002 %T Some features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Blais, J-G. %A Raiche, G. %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %C New Orleans, LA %G eng %0 Generic %D 2002 %T STAR Math 2 Computer-Adaptive Math Test and Database: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Journal Article %J Assessment %D 2002 %T A structure-based approach to psychological measurement: Matching measurement models to latent structure %A Ruscio, John %A Ruscio, Ayelet Meron %K Adaptive Testing %K Assessment %K Classification (Cognitive Process) %K Computer Assisted %K Item Response Theory %K Psychological %K Scaling (Testing) %K Statistical Analysis computerized adaptive testing %K Taxonomies %K Testing %X The present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Assessment %V 9 %P 4-16 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Updated item parameter estimates using sparse CAT data %A Smith, R. L. %A Rizavi, S. %A Paez, R. %A Rotou, O. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Report %D 2001 %T CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research report) %A Nandakumar, R. %A Roussos, L. %I Law School Admission Council %C Newton, PA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2001 %T Computerized adaptive testing with the generalized graded unfolding model %A Roberts, J. S. %A Lin, Y. %A Laughlin, J. E. %K Attitude Measurement %K College Students computerized adaptive testing %K Computer Assisted Testing %K Item Response %K Models %K Statistical Estimation %K Theory %X Examined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 25 %P 177-196 %G eng %0 Book %D 2001 %T Development and evaluation of test assembly procedures for computerized adaptive testing %A Robin, F. %C Unpublished doctoral dissertation, University of Massachusetts, Amherst %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T The effect of test and examinee characteristics on the occurrence of aberrant response patterns in a computerized adaptive test %A Rizavi, S. %A Swaminathan, H. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Invited small group session at the 6th Conference of the European Association of Psychological Assessment %D 2001 %T Item pool design for computerized adaptive tests %A Reckase, M. D. %B Invited small group session at the 6th Conference of the European Association of Psychological Assessment %C Aachen, Germany %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T A monte carlo study of the feasibility of on-the-fly assessment %A Revuelta, J. %A Bejar, I. I. %A Stocking, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Commnication présentée à l’intérieur de la 23e session d’études de l’Association pour le dévelopement de la mesure et de l’évaluation en éducation %D 2001 %T Pour une évaluation sur mesure des étudiants : défis et enjeux du testing adaptatif %A Raîche, G. %B Commnication présentée à l’intérieur de la 23e session d’études de l’Association pour le dévelopement de la mesure et de l’évaluation en éducation %C ADMÉÉ %G eng %0 Conference Paper %B Presented at the 23th Study Session of the ADMÉÉ. Québec: Association pour le développement de la mesure et de l’évaluation en éducation (ADMÉÉ). %D 2001 %T Pour une évaluation sur mesure pour chaque étudiant : défis et enjeux du testing adaptatif par ordinateur en éducation [Tailored testing for each student : Principles and stakes of computerized adaptive testing in education] %A Raîche, G, Blais, J.G. %A Boiteau, N. %B Presented at the 23th Study Session of the ADMÉÉ. Québec: Association pour le développement de la mesure et de l’évaluation en éducation (ADMÉÉ). %G eng %0 Conference Paper %B Communication présentée à l’intérieur du 69e congrès de l’Association canadienne française pour l’avancement de la science %D 2001 %T Principes et enjeux du testing adaptatif : de la loi des petits nombres à la loi des grands nombres %A Raîche, G. %B Communication présentée à l’intérieur du 69e congrès de l’Association canadienne française pour l’avancement de la science %C Acfas %G eng %0 Journal Article %J Apuntes de Psicologia %D 2001 %T Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] %A Olea Díaz, J. %A Ponsoda Gil, V. %A Revuelta Menéndez, J. %A Hontangas Beltrán, P. %A Abad, F. J. %K Computer Assisted Testing %K English as Second Language %K Psychometrics computerized adaptive testing %X Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Apuntes de Psicologia %V 19 %P 11-28 %G eng %0 Generic %D 2001 %T STAR Early Literacy Computer-Adaptive Diagnostic Assessment: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Journal Article %J Psicothema %D 2000 %T Algoritmo mixto mínima entropía-máxima información para la selección de ítems en un test adaptativo informatizado %A Dorronsoro, J. R. %A Santa-Cruz, C. %A Rubio Franco, V. J. %A Aguado García, D. %K computerized adaptive testing %X El objetivo del estudio que presentamos es comparar la eficacia como estrat egia de selección de ítems de tres algo ritmos dife rentes: a) basado en máxima info rmación; b) basado en mínima entropía; y c) mixto mínima entropía en los ítems iniciales y máxima info rmación en el resto; bajo la hipótesis de que el algo ritmo mixto, puede dotar al TAI de mayor eficacia. Las simulaciones de procesos TAI se re a l i z a ron sobre un banco de 28 ítems de respuesta graduada calibrado según el modelo de Samejima, tomando como respuesta al TAI la respuesta ori ginal de los sujetos que fueron utilizados para la c a l i b ración. Los resultados iniciales mu e s t ran cómo el cri t e rio mixto es más eficaz que cualquiera de los otros dos tomados indep e n d i e n t e m e n t e. Dicha eficacia se maximiza cuando el algo ritmo de mínima entropía se re s t ri n ge a la selección de los pri m e ros ítems del TAI, ya que con las respuestas a estos pri m e ros ítems la estimación de q comienza a ser re l evante y el algo ritmo de máxima informaciónse optimiza.Item selection algo rithms in computeri zed adap t ive testing. The aim of this paper is to compare the efficacy of three different item selection algo rithms in computeri zed adap t ive testing (CAT). These algorithms are based as follows: the first one is based on Item Info rm ation, the second one on Entropy, and the last algo rithm is a mixture of the two previous ones. The CAT process was simulated using an emotional adjustment item bank. This item bank contains 28 graded items in six categories , calibrated using Samejima (1969) Graded Response Model. The initial results show that the mixed criterium algorithm performs better than the other ones. %B Psicothema %V 12 %P 12-14 %G eng %0 Generic %D 2000 %T CBTS: Computer-based testing simulation and analysis [computer software] %A Robin, F. %C Amherst, MA: University of Massachusetts, School of Education %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Classification accuracy and test security for a computerized adaptive mastery test calibrated with different IRT models %A Robin, F. %A Xing, D. %A Scrams, D. %A Potenza, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Assessment %D 2000 %T Computerization and adaptive administration of the NEO PI-R %A Reise, S. P. %A Henson, J. M. %K *Personality Inventory %K Algorithms %K California %K Diagnosis, Computer-Assisted/*methods %K Humans %K Models, Psychological %K Psychometrics/methods %K Reproducibility of Results %X This study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments. %B Assessment %V 7 %P 347-64 %G eng %M 11151961 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Computerized testing – the adolescent years: Juvenile delinquent or positive role model %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Generic %D 2000 %T Development and evaluation of test assembly procedures for computerized adaptive testing (Laboratory of Psychometric and Evaluative Methods Research Report No 391) %A Robin, F. %C Amherst MA: University of Massachusetts, School of Education. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T An integer programming approach to item bank design %A van der Linden, W. J. %A Veldkamp, B. P. %A Reese, L. M. %K Aptitude Measures %K Item Analysis (Test) %K Item Response Theory %K Test Construction %K Test Items %X An integer programming approach to item bank design is presented that can be used to calculate an optimal blueprint for an item bank, in order to support an existing testing program. The results are optimal in that they minimize the effort involved in producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models, which can be used as a set of monitoring tools in item bank management. The approach is demonstrated empirically for an item bank that was designed for the Law School Admission Test. %B Applied Psychological Measurement %V 24 %P 139-150 %G eng %0 Journal Article %J Medical Care %D 2000 %T Item response theory and health outcomes measurement in the 21st century %A Hays, R. D. %A Morales, L. S. %A Reise, S. P. %K *Models, Statistical %K Activities of Daily Living %K Data Interpretation, Statistical %K Health Services Research/*methods %K Health Surveys %K Human %K Mathematical Computing %K Outcome Assessment (Health Care)/*methods %K Research Design %K Support, Non-U.S. Gov't %K Support, U.S. Gov't, P.H.S. %K United States %X Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. %B Medical Care %V 38 %P II28-II42 %G eng %M 10982088 %0 Journal Article %J Psicothema %D 2000 %T Item selection algorithms in computerized adaptive testing %A Garcia, David A. %A Santa Cruz, C. %A Dorronsoro, J. R. %A Rubio Franco, V. J. %X Studied the efficacy of 3 different item selection algorithms in computerized adaptive testing. Ss were 395 university students (aged 20-25 yrs) in Spain. Ss were asked to submit answers via computer to 28 items of a personality questionnaire using item selection algorithms based on maximum item information, entropy, or mixed item-entropy algorithms. The results were evaluated according to ability of Ss to use item selection algorithms and number of questions. Initial results indicate that mixed criteria algorithms were more efficient than information or entropy algorithms for up to 15 questionnaire items, but that differences in efficiency decreased with increasing item number. Implications for developing computer adaptive testing methods are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Psicothema %V 12 %P 12-14 %G eng %0 Book %D 2000 %T La distribution dchantillonnage en testing adaptatif en fonction de deux rgles darrt : selon lerreur type et selon le nombre ditems administrs [Sampling distribution of the proficiency estimate in computerized adaptive testing according to two stopping... %A Rache, G. %C Doctoral thesis, Montreal: University of Montreal %G eng %0 Journal Article %J Psicolgia %D 2000 %T Psychometric and psychological effects of review on computerized fixed and adaptive tests %A Olea, J. %A Revuelta, J. %A Ximenez, M. C. %A Abad, F. J. %B Psicolgia %V 21 %P 157-173 %G Spanish %0 Generic %D 2000 %T A selection procedure for polytomous items in computerized adaptive testing (Measurement and Research Department Reports 2000-5) %A Rijn, P. W. van, %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %C Arnhem, The Netherlands: Cito %G eng %0 Generic %D 2000 %T STAR Reading 2 Computer-Adaptive Reading Test and Database: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Conference Paper %B Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice %D 2000 %T Test security and the development of computerized tests %A Guo, F. %A Way, W. D. %A Reshetar, R. %B Paper presented at the National Council on Measurement in Education invited symposium: Maintaining test security in computerized programs–Implications for practice %C New Orleans %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education< Montreal %D 1999 %T Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing %A Robin, F. %B Paper presented at the annual meeting of the National Council on Measurement in Education< Montreal %C Canada %G eng %0 Journal Article %J Journal of Educational Measurement %D 1999 %T Can examinees use a review option to obtain positively biased ability estimates on a computerized adaptive test? %A Vispoel, W. P. %A Rocklin, T. R. %A Wang, T. %A Bleiler, T. %B Journal of Educational Measurement %V 36 %P 141-157 %G eng %0 Generic %D 1999 %T A comparison of testlet-based test designs for computerized adaptive testing (LSAC Computerized Testing Report 97-01) %A Schnipke, D. L. %A Reese, L. M. %C Newtown, PA: LSAC. %G eng %0 Journal Article %J Applied Measurement in Education %D 1999 %T The effects of test difficulty manipulation in computerized adaptive testing and self-adapted testing %A Ponsoda, V. %A Olea, J. %A Rodriguez, M. S. %A Revuelta, J. %B Applied Measurement in Education %V 12 %P 167-184 %G eng %0 Report %D 1999 %T Incorporating content constraints into a multi-stage adaptive testlet design. %A Reese, L. M. %A Schnipke, D. L. %A Luebke, S. W. %X Most large-scale testing programs facing computerized adaptive testing (CAT) must face the challenge of maintaining extensive content requirements, but content constraints in computerized adaptive testing (CAT) can compromise the precision and efficiency that could be achieved by a pure maximum information adaptive testing algorithm. This simulation study first evaluated whether realistic content constraints could be met by carefully assembling testlets and appropriately selecting testlets for each test taker that, when combined, would meet the content requirements of the test and would be adapted to the test takers ability level. The second focus of the study was to compare the precision of the content-balanced testlet design with that achieved by the current paper-and-pencil version of the test through data simulation. The results reveal that constraints to control for item exposure, testlet overlap, and efficient pool utilization need to be incorporated into the testlet assembly algorithm. More refinement of the statistical constraints for testlet assembly is also necessary. However, even for this preliminary attempt at assembling content-balanced testlets, the two-stage computerized test simulated with these testlets performed quite well. (Contains 5 figures, 5 tables, and 12 references.) (Author/SLD) %B LSAC Computerized Testing Report %I Law School Admission Council %C Princeton, NJ. USA %@ Series %G eng %M ED467816 %0 Journal Article %J Journal of Educational Measurement %D 1998 %T A comparison of item exposure control methods in computerized adaptive testing %A Revuelta, J., %A Ponsoda, V. %B Journal of Educational Measurement %V 35 %P 311-327 %G eng %0 Generic %D 1998 %T Item banking %A Rudner, L. M. %X Discusses the advantages and disadvantages of using item banks while providing useful information to those who are considering implementing an item banking project in their school district. The primary advantage of item banking is in test development. Also describes start-up activities in implementing item banking. (SLD) %B Practical Assessment, Research and Evaluation %V 6 %G eng %M EJ670692 %0 Journal Article %J Applied Psychological Measurement %D 1998 %T A model for optimal constrained adaptive testing %A van der Linden, W. J. %A Reese, L. M. %K computerized adaptive testing %X A model for constrained computerized adaptive testing is proposed in which the information in the test at the trait level (0) estimate is maximized subject to a number of possible constraints on the content of the test. At each item-selection step, a full test is assembled to have maximum information at the current 0 estimate, fixing the items already administered. Then the item with maximum in-formation is selected. All test assembly is optimal because a linear programming (LP) model is used that automatically updates to allow for the attributes of the items already administered and the new value of the 0 estimator. The LP model also guarantees that each adaptive test always meets the entire set of constraints. A simulation study using a bank of 753 items from the Law School Admission Test showed that the 0 estimator for adaptive tests of realistic lengths did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. %B Applied Psychological Measurement %V 22 %P 259-270 %G eng %0 Generic %D 1997 %T CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research report) %A Nandakumar, R. %A Roussos, L. %C Newtown, PA: Law School Admission Council %G eng %0 Conference Paper %B Paper presented at the meeting of American Educational Research Association %D 1997 %T A comparison of testlet-based test designs for computerized adaptive testing %A Schnipke, D. L. %A Reese, L. M. %B Paper presented at the meeting of American Educational Research Association %C Chicago, IL %G eng %0 Journal Article %J Quality of Life Research %D 1997 %T Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing %A Revicki, D. A. %A Cella, D. F. %K *Health Status %K *HIV Infections/diagnosis %K *Quality of Life %K Diagnosis, Computer-Assisted %K Disease Progression %K Humans %K Psychometrics/*methods %X Health status assessment is frequently used to evaluate the combined impact of human immunodeficiency virus (HIV) disease and its treatment on functioning and well-being from the patient's perspective. No single health status measure can efficiently cover the range of problems in functioning and well-being experienced across HIV disease stages. Item response theory (IRT), item banking and computer adaptive testing (CAT) provide a solution to measuring health-related quality of life (HRQoL) across different stages of HIV disease. IRT allows us to examine the response characteristics of individual items and the relationship between responses to individual items and the responses to each other item in a domain. With information on the response characteristics of a large number of items covering a HRQoL domain (e.g. physical function, and psychological well-being), and information on the interrelationships between all pairs of these items and the total scale, we can construct more efficient scales. Item banks consist of large sets of questions representing various levels of a HRQoL domain that can be used to develop brief, efficient scales for measuring the domain. CAT is the application of IRT and item banks to the tailored assessment of HRQoL domains specific to individual patients. Given the results of IRT analyses and computer-assisted test administration, more efficient and brief scales can be used to measure multiple domains of HRQoL for clinical trials and longitudinal observational studies. %B Quality of Life Research %7 1997/08/01 %V 6 %P 595-600 %8 Aug %@ 0962-9343 (Print) %G eng %M 9330558 %0 Conference Paper %B Paper presented at the Psychometric Society meeting %D 1997 %T Identifying similar item content clusters on multiple test forms %A Reckase, M. D. %A Thompson, T.D. %A Nering, M. %B Paper presented at the Psychometric Society meeting %C Gatlinburg, TN, June %G eng %0 Generic %D 1997 %T Incorporating content constraints into a multi-stage adaptive testlet design: LSAC report %A Reese, L. M. %A Schnipke, D. L. %A Luebke, S. W. %C Newtown, PA: Law School Admission Council %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T An investigation of self-adapted testing in a Spanish high school population %A Ponsoda, V. %A Wise, S. L. %A Olea, J. %A Revuelta, J. %B Educational and Psychological Measurement %V 57 %P 210-221 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The role of item feedback in self-adapted testing %A Roos, L. L. %A Wise, S. L. %A Plake, B. S. %B Educational and Psychological Measurement %V 57 %P 85-98 %G eng %0 Journal Article %J Stress & Coping: An International Journal %D 1997 %T Self-adapted testing: Improving performance by modifying tests instead of examinees %A Rocklin, T. %X This paper describes self-adapted testing and some of the evidence concerning its effects, presents possible theoretical explanations for those effects, and discusses some of the practical concerns regarding self-adapted testing. Self-adapted testing is a variant of computerized adapted testing in which the examine makes dynamic choices about the difficulty of the items he or she attempts. Self-adapted testing generates scores that are, in constrast to computerized adapted test and fixed-item tests, uncorrelated with a measure of trait test anxiety. This lack of correlation with an irrelevant attribute of the examine is evidence of an improvement in the construct validity of the scores. This improvement comes at the cost of a decrease in testing efficiency. The interaction between test anxiety and test administration mode is more consistent with an interference theory of test anxiety than a deficit theory. Some of the practical concerns regarding self-adapted testing can be ruled out logically, but others await empirical investigation. %B Stress & Coping: An International Journal %V 10(1) %P 83-104 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T A simulation study of the use of the Mantel-Haenszel and logistic regression procedures for assessing DIF in a CAT environment %A Ross, L. P. %A Nandakumar, R, %A Clauser, B. E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Revista Electrónica de Metodología Aplicada %D 1997 %T Una solución a la estimatión inicial en los tests adaptivos informatizados [A solution to initial estimation in CATs.] %A Revuelta, J. %A Ponsoda, V. %B Revista Electrónica de Metodología Aplicada %V 2 %P 1-6 %G Spanish %0 Conference Paper %B annual meeting of the American Educational Research Association %D 1997 %T Validation of CATSIB To investigate DIF of CAT data %A Nandakumar, R. %A Roussos, L. A. %K computerized adaptive testing %X This paper investigates the performance of CATSIB (a modified version of the SIBTEST computer program) to assess differential item functioning (DIF) in the context of computerized adaptive testing (CAT). One of the distinguishing features of CATSIB is its theoretically built-in regression correction to control for the Type I error rates when the distributions of the reference and focal groups differ on the intended ability. This phenomenon is also called impact. The Type I error rate of CATSIB with the regression correction (WRC) was compared with that of CATSIB without the regression correction (WORC) to see if the regression correction was indeed effective. Also of interest was the power level of CATSIB after the regression correction. The subtest size was set at 25 items, and sample size, the impact level, and the amount of DIF were varied. Results show that the regression correction was very useful in controlling for the Type I error, CATSIB WORC had inflated observed Type I errors, especially when impact levels were high. The CATSIB WRC had observed Type I error rates very close to the nominal level of 0.05. The power rates of CATSIB WRC were impressive. As expected, the power increased as the sample size increased and as the amount of DIF increased. Even for small samples with high impact rates, power rates were 64% or higher for high DIF levels. For large samples, power rates were over 90% for high DIF levels. (Contains 12 tables and 7 references.) (Author/SLD) %B annual meeting of the American Educational Research Association %C Chicago, IL. USA %G eng %M ED409332 %0 Conference Paper %B Paper presented at the annual meeting of the National council on Measurement in Education %D 1996 %T Can examinees use a review option to positively bias their scores on a computerized adaptive test? Paper presented at the annual meeting of the National council on Measurement in Education, New York %A Rocklin, T. R. %A Vispoel, W. P. %A Wang, T. %A Bleiler, T. L. %B Paper presented at the annual meeting of the National council on Measurement in Education %C New York NY %G eng %0 Journal Article %J Journal of Educational & Behavioral Statistics %D 1996 %T Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test %A Spray, J. A. %A Reckase, M. D. %B Journal of Educational & Behavioral Statistics %V 21 %P 405-414 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1996 %T Conducting self-adapted testing using MicroCAT %A Roos, L. L. %A Wise, S. L. %A Yoes, M. E. %A Rocklin, T. R. %B Educational and Psychological Measurement %V 56 %P 821-827 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1996 %T An evaluation of a two-stage testlet design for computerized adaptive testing %A Reese, L. M. %A Schnipke, D. L. %B Paper presented at the annual meeting of the Psychometric Society %C Banff, Alberta, Canada %G eng %0 Journal Article %J Psicologica %D 1996 %T Metodos sencillos para el control de las tasas de exposicion en tests adaptativos informatizados [Simple methods for item exposure control in CATs] %A Revuelta, J. %A Ponsoda, V. %B Psicologica %V 17 %P 161-172 %G Spanish %0 Journal Article %J Estudios de Psicologica %D 1996 %T Propiedades psicometricas du un test adaptivo informatizado do vocabulario ingles [Psychometric properties of a computerized adaptive tests for the measurement of English vocabulary] %A Olea., J. %A Ponsoda, V. %A Revuelta, J. %A Belchi, J. %B Estudios de Psicologica %V 55 %P 61-73 %G Spanish %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1996 %T A Type I error rate study of a modified SIBTEST DIF procedure with potential application to computerized adaptive tests %A Roussos, L. %B Paper presented at the annual meeting of the Psychometric Society %C Alberta Canada %G eng %0 Journal Article %J Journal of Personality Assessment %D 1995 %T Comparability and validity of computerized adaptive testing with the MMPI-2 %A Roper, B. L. %A Ben-Porath, Y. S. %A Butcher, J. N. %X The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure. %B Journal of Personality Assessment %7 1995/10/01 %V 65 %P 358-71 %8 Oct %@ 0022-3891 (Print) %G eng %M 16367721 %0 Journal Article %J Journal of the American Dietetic Association %D 1995 %T Computer-adaptive testing: A new breed of assessment %A Ruiz, B. %A Fitz, P. A. %A Lewis, C. %A Reidy, C. %B Journal of the American Dietetic Association %V 95 %P 1326-1327 %G eng %0 Journal Article %J Journal of the American Dietetic Association %D 1995 %T Computer-adaptive testing: A new breed of assessment %A Ruiz, B. %A Fitz, P. A. %A Lewis, C. %A Reidy, C. %B Journal of the American Dietetic Association %V 95 %P 1326-1327 %G eng %0 Journal Article %J Journal of Educational Psychology %D 1995 %T Effects and underlying mechanisms of self-adapted testing %A Rocklin, T. R. %A O’Donnell, A. M. %A Holst, P. M. %B Journal of Educational Psychology %V 87 %P 103-116 %G eng %0 Book %D 1995 %T El control de la exposicin de los items en tests adaptativos informatizados [Item exposure control in computerized adaptive tests] %A Revuelta, J. %C Unpublished master’s dissertation, Universidad Autonma de Madrid, Spain %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1994 %T ADTEST: A computer-adaptive tests based on the maximum information principle %A Ponsoda, V. %A Olea, J., %A Revuelta, J. %B Educational and Psychological Measurement %V 54 %P 680-686 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T Comparing computerized adaptive and self-adapted tests: The influence of examinee achievement locus of control %A Wise, S. L. %A Roos, L. L. %A Plake, B. S. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Individual differences and test administration procedures: A comparison of fixed-item, computerized adaptive, and self-adapted testing %A Vispoel, W. P. %A Rocklin, T. R. %A Wang, T. %B Applied Measurement in Education %V 7 %P 53-79 %G eng %0 Generic %D 1994 %T La simulation de modèle sur ordinateur en tant que méthode de recherche : le cas concret de l’étude de la distribution d’échantillonnage de l’estimateur du niveau d’habileté en testing adaptatif en fonction de deux règles d’arrêt %A Raîche, G. %C Actes du 6e colloque de l‘Association pour la recherche au collégial. Montréal : Association pour la recherche au collégial, ARC %G eng %0 Conference Paper %B Québec: Proceeding of the 14th Congress of the Association québécoise de pédagogie collégiale. Montréal: Association québécoise de pédagogie collégiale (AQPC). %D 1994 %T L'évaluation nationale individualisée et assistée par ordinateur [Large scale assessment: Tailored and computerized] %A Raîche, G. %A Béland, A. %B Québec: Proceeding of the 14th Congress of the Association québécoise de pédagogie collégiale. Montréal: Association québécoise de pédagogie collégiale (AQPC). %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The relationship between examinee anxiety and preference for self-adapted testing %A Wise, S. L. %A Roos, L. L. %A Plake, B. S., %A Nebelsick-Gullett, L. J. %B Applied Measurement in Education %V 7 %P 81-91 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T The selection of test items for decision making with a computer adaptive test %A Reckase, M. D. %A Spray, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the national meeting of the National Council on Measurement in Education %D 1994 %T The selection of test items for decision making with a computer adaptive test %A Spray, J. A. %A Reckase, M. D. %B Paper presented at the national meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Self-adapted testing %A Rocklin, T. R. %B Applied Measurement in Education %V 7 %P 3-14 %G eng %0 Book Section %D 1994 %T Utilisation de la simulation en tant que méthodologie de recherche [Simulation methodology in research] %A Raîche, G. %C Association pour la recherche au collégial (Ed.) : L'en-quête de la créativité [In quest of creativity]. Proceeding of the 6th Congress of the ARC. Montréal: Association pour la recherche au collégial (ARC). %G eng %0 Journal Article %J Dissertation Abstracts International %D 1993 %T Comparability and validity of computerized adaptive testing with the MMPI-2 %A Roper, B. L. %K computerized adaptive testing %B Dissertation Abstracts International %V 53 %P 3791 %G eng %0 Conference Paper %B Unpublished manuscript. ( %D 1993 %T Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using an adaptive test %A Spray, J. A. %A Reckase, M. D. %B Unpublished manuscript. ( %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T Establishing time limits for the GRE computer adaptive tests %A Reese, C. M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Generic %D 1993 %T Field test of a computer-based GRE general test (GRE Board Technical Report 88-8; Educational Testing Service Research Rep No RR 93-07) %A Schaeffer, G. A. %A Reese, C. M. %A Steffen, M. %A McKinley, R. L. %A Mills, C. N. %C Princeton NJ: Educational Testing Service. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the AEARA %D 1993 %T Individual differences and test administration procedures: A comparison of fixed-item, adaptive, and self-adapted testing %A Vispoel, W. P. %A Rocklin, T. R. %B Paper presented at the annual meeting of the AEARA %C Atlanta GA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T A simulated comparison of testlets and a content balancing procedure for an adaptive certification examination %A Reshetar, R. A. %A Norcini, J. J. %A Shea, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T A simulated comparison of two content balancing and maximum information item selection procedures for an adaptive certification examination %A Reshetar, R. A. %A Norcini, J. J. %A Shea, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta %G eng %0 Journal Article %J Journal of Educational Measurement %D 1992 %T A comparison of self-adapted and computerized adaptive achievement tests %A Wise, S. L. %A Plake, S. S %A Johnson, P. L. %A Roos, S. L. %B Journal of Educational Measurement %V 29 %P 329-339 %G eng %0 Conference Paper %B Paper presented at the 27th Annual Symposium on Recent Developments in the MMPI/MMPI-2 %D 1992 %T Computerized adaptive testing with the MMPI-2: Reliability, validity, and comparability to paper and pencil administration %A Ben-Porath, Y. S. %A Roper, B. L. %B Paper presented at the 27th Annual Symposium on Recent Developments in the MMPI/MMPI-2 %C Minneapolis MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1992 %T Effects of feedback during self-adapted testing on estimates of ability %A Holst, P. M. %A O’Donnell, A. M. %A Rocklin, T. R. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting of the NMCE %D 1992 %T The effects of feedback in computerized adaptive and self-adapted tests %A Roos, L. L. %A Plake, B. S. %A Wise, S. L. %B Paper presented at the annual meeting of the NMCE %C San Francisco %G eng %0 Journal Article %J Journal of Personality Assessment %D 1991 %T Comparability of computerized adaptive and conventional testing with the MMPI-2 %A Roper, B. L. %A Ben-Porath, Y. S. %A Butcher, J. N. %X A computerized adaptive version and the standard version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were administered 1 week apart to a sample of 155 college students to assess the comparability of the two versions. The countdown method was used to adaptively administer Scales L, F, the I0 clinical scales, and the 15 new content scales. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of computerized adaptive and conventional testing with the MMPI-2. Substantial item savings were found with the adaptive version. Future directions in the study of adaptive testing with the MMPI-2 are discussed. %B Journal of Personality Assessment %7 1991/01/01 %V 57 %P 278-290 %8 Oct %@ 0022-3891 (Print) %G eng %M 16370884 %0 Journal Article %J Mid-Western Educational Researcher %D 1991 %T Correlates of examinee item choice behavior in self-adapted testing %A Johnson, J. L. %A Roos, L. L. %A Wise, S. L. %A Plake, B. S. %B Mid-Western Educational Researcher %V 4 %P 25-28 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1991 %T An empirical comparison of self-adapted and maximum information item selection %A Rocklin, T. R. %A O’Donnell, A. M. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Generic %D 1991 %T Patterns of alcohol and drug use among federal offenders as assessed by the Computerized Lifestyle Screening Instrument %A Robinson, D. %A Porporino, F. J. %A Millson, W. A. %K computerized adaptive testing %K drug abuse %K substance use %I Research and Statistics Branch, Correctional Service of Canada %C Ottawa, ON. Canada %@ R-11 %G eng %0 Journal Article %J Journal of Marketing Research %D 1990 %T Adaptive designs for Likert-type data: An approach for implementing marketing research %A Singh, J. %A Howell, R. D. %A Rhoads, G. K. %B Journal of Marketing Research %V 27 %P 304-321 %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1990 %T Computer testing: Pragmatic issues and research needs %A Rudner, L. M. %B Educational Measurement: Issues and Practice %V 9 (2) %P 19-20 %G eng %0 Conference Paper %B Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 %D 1990 %T An empirical study of the computer adaptive MMPI-2 %A Ben-Porath, Y. S. %A Roper, B. L. %A Butcher, J. N. %B Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 %C Minneapolis MN %0 Conference Paper %B Paper presented at the 98th Annual Meeting of the American Psychological Association %D 1990 %T Illustration of computerized adaptive testing with the MMPI-2 %A Roper, B. L. %A Ben-Porath, Y. S. %A Butcher, J. N. %B Paper presented at the 98th Annual Meeting of the American Psychological Association %C Boston MA %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1989 %T Adaptive testing: The evolution of a good idea %A Reckase, M. D. %K computerized adaptive testing %B Educational Measurement: Issues and Practice %V 8 %P 11-15 %@ 1745-3992 %G eng %0 Generic %D 1989 %T Computerized adaptive tests %A Grist, S. %A Rudner, L. M. %A Wise %C ERIC Clearinghouse on Tests, Measurement, and Evaluation, no. 107 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1989 %T Individual differences in item selection in self adaptive testing %A Rocklin, T. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Generic %D 1989 %T The interpretation and application of multidimensional item response theory models; and computerized testing in the instructional environment: Final Report (Research Report ONR 89-2) %A Reckase, M. D. %C Iowa City IA: The American College Testing Program %G eng %0 Conference Paper %B Paper presented at the meeting of the American Educational Research Association %D 1988 %T Computerized adaptive testing: a good idea waiting for the right technology %A Reckase, M. D. %B Paper presented at the meeting of the American Educational Research Association %C New Orleans, April 1988 %G eng %0 Conference Paper %B Unpublished manuscript. %D 1988 %T Fitting the two-parameter model to personality data: The parameterization of the Multidimensional Personality Questionnaire %A Reise, S. P. %A Waller, N. G. %B Unpublished manuscript. %G eng %0 Journal Article %J Professional Psychology: Research and Practice %D 1987 %T Computerized psychological testing: Overview and critique %A Burke, M. J, %A Normand, J %A Raju, N. M. %B Professional Psychology: Research and Practice %V 1 %P 42-51 %G eng %0 Report %D 1987 %T The effect of item parameter estimation error on decisions made using the sequential probability ratio test %A Spray, J. A. %A Reckase, M. D. %K computerized adaptive testing %K Sequential probability ratio test %B ACT Research Report Series %I DTIC Document %C Iowa City, IA. USA %G eng %0 Generic %D 1987 %T The effect of item parameter estimation error on the decisions made using the sequential probability ratio test (ACT Research Report Series 87-17) %A Spray, J. A. %A Reckase, M. D. %C Iowa City IA: American College Testing %G eng %0 Journal Article %J Journal of Educational Psychology %D 1987 %T Self-adapted testing: A performance improving variation of computerized adaptive testing %A Rocklin, T. R., %A O’Donnell, A. M. %B Journal of Educational Psychology %V 79 %P 315-319 %G eng %0 Generic %D 1986 %T Final report: The use of tailored testing with instructional programs (Research Report ONR 86-1) %A Reckase, M. D. %C Iowa City IA: The American College Testing Program, Assessment Programs Area, Test Development Division. %G eng %0 Generic %D 1984 %T Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T A plan for scaling the computerized adaptive Armed Services Vocational Aptitude Battery %A Green, B. F. %A Bock, B. D., %A Linn, R. L. %A Lord, F. M., %A Reckase, M. D. %B Journal of Educational Measurement %V 21 %P 347-360 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T Predictive validity of computerized adaptive testing in a military training environment %A Sympson, J. B. %A Weiss, D. J. %A Ree, M. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1984 %T The selection of items for decision making with a computer adaptive test %A Spray, J. A. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T Technical guidelines for assessing computerized adaptive tests %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %K computerized adaptive testing %K Mode effects %K paper-and-pencil %B Journal of Educational Measurement %V 21 %P 347-360 %@ 1745-3984 %G eng %0 Generic %D 1983 %T An evaluation of one- and three-parameter logistic tailored testing procedures for use with small item pools (Research Report ONR83-1) %A McKinley, R. L. %A Reckase, M. D. %C Iowa City IA: American College Testing Program %G eng %0 Book Section %B New horizons in testing: Latent trait theory and computerized adaptive testing %D 1983 %T A procedure for decision making using tailored testing. %A Reckase, M. D. %K CCAT %K CLASSIFICATION Computerized Adaptive Testing %K sequential probability ratio testing %K SPRT %B New horizons in testing: Latent trait theory and computerized adaptive testing %I Academic Press %C New York, NY. USA %P 237-254 %G eng %0 Book Section %D 1982 %T Discussion: Adaptive and sequential testing %A Reckase, M. D. %C D. J. Weiss (Ed.). Proceedings of the 1982 Computerized Adaptive Testing Conference (pp. 290-294). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1982 %T Predictive validity of conventional and adaptive tests in an Air Force training environment (Report AFHRL-TR-81-40) %A Sympson, J. B. %A Weiss, D. J. %A Ree, M. J. %C Brooks Air Force Base TX: Air Force Human Resources Laboratory, Manpower and Personnel Division %G eng %0 Generic %D 1981 %T Adaptive testing without a computer %A Friedman, D. %A Steinberg, A, %A Ree, M. J. %C Catalog of Selected Documents in Psychology, Nov 1981, 11, 74-75 (Ms. No. 2350). AFHRL Technical Report 80-66. %G eng %0 Report %D 1981 %T A comparison of a Bayesian and a maximum likelihood tailored testing procedure %A McKinley, R. L., %A Reckase, M. D. %B Research Report 81-2 %I University of Missouri, Department of Educational Psychology, Tailored Testing Research Laboratory %C Columbia MO %G eng %9 Technical report %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1981 %T A comparison of a maximum likelihood and a Bayesian estimation procedure for tailored testing %A Rosso, M. A. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Los Angeles CA %G eng %0 Journal Article %J Applied Psychological Measurement %D 1981 %T The Effects of Item Calibration Sample Size and Item Pool Size on Adaptive Testing %A Ree, M. J. %B Applied Psychological Measurement %V 5 %P 11-19 %G English %N 1 %0 Generic %D 1981 %T Final report: Procedures for criterion referenced tailored testing %A Reckase, M. D. %C Columbia: University of Missouri, Educational Psychology Department %G eng %0 Generic %D 1981 %T The use of the sequential probability ratio test in making grade classifications in conjunction with tailored testing (Research Report 81-4) %A Reckase, M. D. %C Columbia MO: University of Missouri, Department of Educational Psychology %G eng %0 Journal Article %J Association for Educational Data Systems Journal %D 1980 %T Computer applications to ability testing %A McKinley, R. L., %A Reckase, M. D. %B Association for Educational Data Systems Journal %V 13 %P 193-203 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1980 %T Effects of program parameters and item pool characteristics on the bias of a three-parameter tailored testing procedure %A Patience, W. M. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Boston MA, U %G eng %0 Generic %D 1980 %T Final Report: Computerized adaptive testing, assessment of requirements %A Rehab-Group-Inc. %C Falls Church VA: Author %G eng %0 Journal Article %J Catalog of Selected Documents in Psychology %D 1980 %T Operational characteristics of a one-parameter tailored testing procedure %A Patience, W. M., %A Reckase, M. D. %B Catalog of Selected Documents in Psychology %V August 1980 %P 10, 66 (Ms No. 2104) %G eng %0 Book Section %D 1980 %T Some decision procedures for use with tailored testing %A Reckase, M. D. %C D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 79-100). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. %G eng %0 Generic %D 1980 %T A successful application of latent trait theory to tailored achievement testing (Research Report 80-1) %A McKinley, R. L. %A Reckase, M. D. %C University of Missouri, Department of Educational Psychology, Tailored Testing Research Laboratory %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1979 %T Operational characteristics of a Rasch model tailored testing procedure when program parameters and item pool attributes are varied %A Patience, W. M. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Generic %D 1979 %T Problems in application of latent-trait models to tailored testing (Research Report 79-1) %A Koch, W. J. %A Reckase, M. D. %C Columbia MO: University of Missouri, Department of Psychology", (also presented at National Council on Measurement in Education, 1979: ERIC No. ED 177 196) note = " %G eng %0 Conference Paper %B Paper presented at the meeting of the Military Testing Association %D 1978 %T A generalization of sequential analysis to decision making with tailored testing %A Reckase, M. D. %B Paper presented at the meeting of the Military Testing Association %C Oklahoma City OK %G eng %0 Generic %D 1978 %T A live tailored testing comparison study of the one- and three-parameter logistic models (Research Report 78-1) %A Koch, W. J. %A Reckase, M. D. %C Columbia MO: University of Missouri, Department of Psychology %G eng %0 Journal Article %J Behavior Research Methods and Instrumentation %D 1977 %T Application of tailored testing to achievement measurement %A English, R. A. %A Reckase, M. D. %A Patience, W. M. %B Behavior Research Methods and Instrumentation %V 9 %P 158-161 %G eng %0 Generic %D 1977 %T Flexilevel adaptive testing paradigm: Validation in technical training %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (I) %G eng %0 Generic %D 1977 %T Flexilevel adaptive training paradigm: Hierarchical concept structures %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (II) %G eng %0 Book Section %D 1977 %T Implementation of a Model Adaptive Testing System at an Armed Forces Entrance and Examination Station %A Ree, M. J. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %0 Journal Article %J Behavior Research Methods and Instrumentation %D 1977 %T Procedures for computerized testing %A Reckase, M. D. %B Behavior Research Methods and Instrumentation %V 70 %P 351-356 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1976 %T The effect of item pool characteristics on the operation of a tailored testing procedure %A Reckase, M. D. %B Paper presented at the annual meeting of the Psychometric Society %C Murray Hill NJ %G eng %0 Generic %D 1976 %T Monte carlo results from a computer program for tailored testing (Technical Report No. 2) %A Cudeck, R. A. %A Cliff, N. A. %A Reynolds, T. J. %A McCormick, D. J. %C Los Angeles CA: University of California, Department of Psychology. %G eng %0 Conference Paper %B Paper presented at the sixth annual meeting of the National Conference on the Use of On-Line Computers in Psychology %D 1976 %T Procedures for computerized testing %A Reckase, M. D. %B Paper presented at the sixth annual meeting of the National Conference on the Use of On-Line Computers in Psychology %C St. Louis MO %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1975 %T The effect of item choice on ability estimation when using a simple logistic tailored testing model %A Reckase, M. D. %B Paper presented at the annual meeting of the American Educational Research Association %C Washington, D.C. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association. %D 1974 %T An application of the Rasch simple logistic model to tailored testing %A Reckase, M. D. %B Paper presented at the annual meeting of the American Educational Research Association. %C St. Loius MO %G eng %0 Generic %D 1974 %T Development of a programmed testing system (Technical Paper 259) %A Bayroff, A. G. %A Ross, R. M %A Fischl, M. A %C Arlington VA: US Army Research Institute for the Behavioral and Social Sciences. NTIS No. AD A001534) %G eng %0 Journal Article %J Behavior Research Methods and Instrumentation %D 1974 %T An interactive computer program for tailored testing based on the one-parameter logistic model %A Reckase, M. D. %B Behavior Research Methods and Instrumentation %V 6 %P 208-212 %G eng %0 Conference Paper %B Paper presented at the National Conference on the Us of On-Line Computers in Psychology %D 1973 %T An interactive computer program for tailored testing based on the one-parameter logistic model %A Reckase, M. D. %B Paper presented at the National Conference on the Us of On-Line Computers in Psychology %C St. Louis MO %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1972 %T Sequential testing for dichotomous decisions. %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %K CCAT %K CLASSIFICATION Computerized Adaptive Testing %K sequential probability ratio testing %K SPRT %B Educational and Psychological Measurement %V 32 %P 85-95. %G eng %0 Generic %D 1970 %T Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T An exploratory study of programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Educational and Psychological Measurement %V 28 %P 345-360 %G eng %0 Generic %D 1968 %T The development and evaluation of several programmed testing methods (Research Bulletin 68-5) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of Educational Measurement %D 1968 %T Reproduction of total test score through the use of sequential programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Journal of Educational Measurement %V 5 %P 183-187 %G eng %0 Book %D 1961 %T An analysis of the application of utility theory to the development of two-stage testing models %A Rosenbach, J. H. %C Unpublished doctoral dissertation, University of Buffalo %G eng