00542nas a2200169 4500008004500000245006600045210006600111300001000177490000700187653002100194653000800215653000800223653003000231653001300261100002000274856007800294 2023 Engldsh 00aExpanding the Meaning of Adaptive Testing to Enhance Validity0 aExpanding the Meaning of Adaptive Testing to Enhance Validity a22-310 v1010aAdaptive Testing10aCAT10aCBT10atest-taking disengagement10avalidity1 aWise, Steven, L uhttp://mail.iacat.org/expanding-meaning-adaptive-testing-enhance-validity00671nas a2200193 4500008004500000022001400045245007100059210006700130490000700197653002100204653002900225653002500254653003900279653001600318100001400334700002200348700002400370856008300394 2023 Engldsh a2165-659200aAn Extended Taxonomy of Variants of Computerized Adaptive Testing0 aExtended Taxonomy of Variants of Computerized Adaptive Testing0 v1010aAdaptive Testing10aevidence-centered design10aItem Response Theory10aknowledge-based model construction10amissingness1 aLevy, Roy1 aBehrens, John, T.1 aMislevy, Robert, J. uhttp://mail.iacat.org/extended-taxonomy-variants-computerized-adaptive-testing00733nas a2200193 4500008004500000245009100045210006900136300001000205490000700215653003500222653003400257653002900291653002600320100001900346700002700365700002400392700002100416856010200437 2023 Engldsh 00aHow Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change?0 aHow Do Trait Change Patterns Affect the Performance of Adaptive a32-580 v1010aadaptive measurement of change10acomputerized adaptive testing10alongitudinal measurement10atrait change patterns1 aTai, Ming, Him1 aCooperman, Allison, W.1 aDeWeese, Joseph, N.1 aWeiss, David, J. uhttp://mail.iacat.org/how-do-trait-change-patterns-affect-performance-adaptive-measurement-change01609nas a2200205 4500008004500000022001400045245005000059210004900109300001000158490000600168520099300174653003501167653003401202653002301236653001901259653002701278100002301305700001501328856006001343 2019 Engldsh a2165-659200aTime-Efficient Adaptive Measurement of Change0 aTimeEfficient Adaptive Measurement of Change a15-340 v73 a
The adaptive measurement of change (AMC) refers to the use of computerized adaptive testing (CAT) at multiple occasions to efficiently assess a respondent’s improvement, decline, or sameness from occasion to occasion. Whereas previous AMC research focused on administering the most informative item to a respondent at each stage of testing, the current research proposes the use of Fisher information per time unit as an item selection procedure for AMC. The latter procedure incorporates not only the amount of information provided by a given item but also the expected amount of time required to complete it. In a simulation study, the use of Fisher information per time unit item selection resulted in a lower false positive rate in the majority of conditions studied, and a higher true positive rate in all conditions studied, compared to item selection via Fisher information without accounting for the expected time taken. Future directions of research are suggested.
10aadaptive measurement of change10acomputerized adaptive testing10aFisher information10aitem selection10aresponse-time modeling1 aFinkelman, Matthew1 aWang, Chun uhttp://iacat.org/jcat/index.php/jcat/article/view/73/3503114nas a2200145 4500008004100000245004900041210004500090260005500135520264100190653002802831653000802859653002102867100001702888856006302905 2017 eng d00aIs CAT Suitable for Automated Speaking Test?0 aCAT Suitable for Automated Speaking Test aNiigata, JapanbNiigata Seiryo Universityc08/20173 aWe have developed automated scoring system of Japanese speaking proficiency, namely SJ-CAT (Speaking Japanese Computerized Adaptive Test), which is operational for last few months. One of the unique features of the test is an adaptive test base on polytomous IRT.
SJ-CAT consists of two sections; Section 1 has sentence reading aloud tasks and a multiple choicereading tasks and Section 2 has sentence generation tasks and an open answer tasks. In reading aloud tasks, a test taker reads a phoneme-balanced sentence on the screen after listening to a model reading. In a multiple choice-reading task, a test taker sees a picture and reads aloud one sentence among three sentences on the screen, which describe the scene most appropriately. In a sentence generation task, a test taker sees a picture or watches a video clip and describes the scene with his/her own words for about ten seconds. In an open answer tasks, the test taker expresses one’s support for or opposition to e.g., a nuclear power generation with reasons for about 30 seconds.
In the course of the development of the test, we found many unexpected and unique characteristics of speaking CAT, which are not found in usual CATs with multiple choices. In this presentation, we will discuss some of such factors that are not previously noticed in our previous project of developing dichotomous J-CAT (Japanese Computerized Adaptive Test), which consists of vocabulary, grammar, reading, and listening. Firstly, we will claim that distribution of item difficulty parameters depends on the types of items. An item pool with unrestricted types of items such as open questions is difficult to achieve ideal distributions, either normal distribution or uniform distribution. Secondly, contrary to our expectations, open questions are not necessarily more difficult to operate in automated scoring system than more restricted questions such as sentence reading, as long as if one can set up suitable algorithm for open question scoring. Thirdly, we will show that the speed of convergence of standard deviation of posterior distribution, or standard error of theta parameter in polytomous IRT used for SJCAT is faster than dichotomous IRT used in J-CAT. Fourthly, we will discuss problems in equation of items in SJ-CAT, and suggest introducing deep learning with reinforcement learning instead of equation. And finally, we will discuss the issues of operation of SJ-CAT on the web, including speed of scoring, operation costs, security among others.
Session Video
10aAutomated Speaking Test10aCAT10alanguage testing1 aImai, Shingo uhttp://mail.iacat.org/cat-suitable-automated-speaking-test03453nas a2200157 4500008004100000245008400041210006900125260005500194520283600249653001203085653002503097653002803122100002303150700002003173856010203193 2017 eng d00aConsiderations in Performance Evaluations of Computerized Formative Assessments0 aConsiderations in Performance Evaluations of Computerized Format aNiigata, JapanbNiigata Seiryo Universityc08/20173 aComputerized adaptive instruments have been widely established and used in the context of summative assessments for purposes including licensure, admissions and proficiency testing. The benefits of examinee tailored examinations, which can provide estimates of performance that are more reliable and valid, have in recent years attracted a greater audience (i.e. patient oriented outcomes, test prep, etc.). Formative assessment, which are most widely understood in their implementation as diagnostic tools, have recently started to expand to lesser known areas of computerized testing such as in implementations of instructional designs aiming to maximize examinee learning through targeted practice.
Using a CAT instrument within the framework of evaluating repetitious examinee performances (in such settings as a Quiz Bank practices for example) poses unique challenges not germane to summative assessments. The scale on which item parameters (and subsequently examinee performance estimates such as Maximum Likelihood Estimates) are determined usually do not take change over time under consideration. While vertical scaling features resolve the learning acquisition problem, most content practice engines do not make use of explicit practice windows which could be vertically aligned. Alternatively, the Multidimensional (MIRT)- and Hierarchical Item Response Theory (HIRT) models allow for the specification of random effects associated with change over time in examinees’ skills, but are often complex and require content and usage resources not often observed.
The research submitted for consideration simulated examinees’ repeated variable length Quiz Bank practice in algebra using a 500 1-PL operational item pool. The stability simulations sought to determine with which rolling item interval size ability estimates would provide the most informative insight into the examinees’ learning progression over time. Estimates were evaluated in terms of reduction in estimate uncertainty, bias and RMSD with the true and total item based ability estimates. It was found that rolling item intervals between 20-25 items provided the best reduction of uncertainty around the estimate without compromising the ability to provide informed performance estimates to students. However, while asymptotically intervals of 20-25 items tended to provide adequate estimates of performance, changes over shorter periods of time assessed with shorter quizzes could not be detected as those changes would be suppressed in lieu of the performance based on the full interval considered. Implications for infrastructure (such as recommendation engines, etc.), product and scale development are discussed.
Session video
10aalgebra10aFormative Assessment10aPerformance Evaluations1 aChajewski, Michael1 aHarnisher, John uhttp://mail.iacat.org/considerations-performance-evaluations-computerized-formative-assessments-003807nas a2200145 4500008004100000245008500041210006900126260005500195520326300250653002503513653001503538653001203553100002503565856007103590 2017 eng d00aFastCAT – Customizing CAT Administration Rules to Increase Response Efficiency0 aFastCAT Customizing CAT Administration Rules to Increase Respons aNiigata, JapanbNiigata Seiryo Universityc08/20173 aA typical pre-requisite for CAT administration is the existence of an underlying item bank completely covering the range of the trait being measured. When a bank fails to cover the full range of the trait, examinees who are close to the floor or ceiling will often never achieve a standard error cut-off and examinees will be forced to answer items increasingly less relevant to their trait level. This scenario is fairly typical for many patients responding to patient reported outcome measures (PROMS). For IACAT 2017 ABSTRACTS BOOKLET 65 example, in the assessment of physical functioning, many item banks ceiling at about the 50%ile. For most healthy patients, after a few items the only items remaining in the bank will represent decreasing ability (even though the patient has already indicated that they are at or above the mean for the population). Another example would be for a patient with no pain taking a Pain CAT. They will probably answer “Never” pain for every succeeding item out to the maximum test length. For this project we sought to reduce patient burden, while maintaining test accuracy, through the reduction of CAT length using novel stopping rules.
We studied CAT administration assessment histories for patients who were administered Patient Reported Outcomes Measurement Information System (PROMIS) CATs. In the PROMIS 1 Wave 2 Back Pain/Depression Study, CATs were administered to N=417 cases assessed across 11 PROMIS domains. Original CAT administration rules were: start with a pre-identified item of moderate difficulty; administer a minimum four items per case; stop when an estimated theta’s SE declines to < 0.3 OR a maximum 12 items are administered.
Original CAT. 12,622 CAT administrations were analyzed. CATs ranged in number of items administered from 4 to 12 items; 72.5% were 4-item CATs. The second and third most frequently occurring CATs were 5-item (n=1102; 8.7%) and 12-item CATs (n=964; 7.6%). 64,062 items total were administered, averaging 5.1 items per CAT. Customized CAT. Three new CAT stopping rules were introduced, each with potential to increase item-presentation efficiency and maintain required score precision: Stop if a case responds to the first two items administered using an “extreme” response category (towards the ceiling or floor for the in item bank, or at ); administer a minimum two items per case; stop if the change in SE estimate (previous to current item administration) is positive but < 0.01.
The three new stopping rules reduced the total number of items administered by 25,643 to 38,419 items (40.0% reduction). After four items were administered, only n=1,824 CATs (14.5%) were still in assessment mode (vs. n=3,477 (27.5%) in the original CATs). On average, cases completed 3.0 items per CAT (vs. 5.1).
Each new rule addressed specific inefficiencies in the original CAT administration process: Cases not having or possessing a low/clinically unimportant level of the assessed domain; allow the SE <0.3 stopping criterion to come into effect earlier in the CAT administration process; cases experiencing poor domain item bank measurement, (e.g., “floor,” “ceiling” cases).
10aAdministration Rules10aEfficiency10aFastCAT1 aGershon, Richard, C. uhttps://drive.google.com/open?id=1oPJV-x0p9hRmgJ7t6k-MCC1nAoBSFM1w03003nas a2200157 4500008004100000245007600041210006900117260005500186520243700241653002102678653002302699653002002722100001602742700001602758856007102774 2017 eng d00aGenerating Rationales to Support Formative Feedback in Adaptive Testing0 aGenerating Rationales to Support Formative Feedback in Adaptive aNiigata, JapanbNiigata Seiryo Universityc08/20173 aComputer adaptive testing offers many important benefits to support and promote life-long learning. Computers permit testing on-demand thereby allowing students to take the test at any time during instruction; items on computerized tests are scored immediately thereby providing students with instant feedback; computerized tests permit continuous administration thereby allowing students to have more choice about when they write their exams. But despite these important benefits, the advent of computer adaptive testing has also raised formidable challenges, particularly in the area of item development. Educators must have access to large numbers of diverse, high-quality test items to implement computerize adaptive testing because items are continuously administered to students. Hence, hundreds or even thousands of items are needed to develop the test item banks necessary for computer adaptive testing. Unfortunately, educational test items, as they are currently created, are time consuming and expensive to develop because each individual item is written, initially, by a content specialist and, then, reviewed, edited, and revised by groups of content specialists to ensure the items yield reliable and valid information. Hence, item development is one of the most important problems that must be solved before we can migrate to computer adaptive testing to support life-long learning because large numbers of high-quality, content-specific, test items are required.
One promising item development method that may be used to address this challenge is with automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modelling practices are used produce hundreds of new test items with the aid of computer technology. The purpose of our presentation is to describe a new methodology for generating both the items and the rationales required to solve each generated item in order to produce the feedback needed to support life-long learning. Our item generation methodology will first be described. To ensure our description is practical, the method will also be demonstrated using generated items from the health sciences to demonstrate how item generation can promote life-long learning for medical educators and practitioners.
Session Video
10aAdaptive Testing10aformative feedback10aItem generation1 aGierl, Mark1 aBulut, Okan uhttps://drive.google.com/open?id=1O5KDFtQlDLvhNoDr7X4JO4arpJkIHKUP01456nas a2200133 4500008004100000245007100041210006900112260005500181520096500236653002101201653000801222100002101230856007101251 2017 eng d00aHow Adaptive is an Adaptive Test: Are all Adaptive Tests Adaptive?0 aHow Adaptive is an Adaptive Test Are all Adaptive Tests Adaptive aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThere are many different kinds of adaptive tests but they all have the characteristic that some feature of the test is customized to the purpose of the test. In the time allotted, it is impossible to consider the adaptation of all of this types so this address will focus on the “classic” adaptive test that matches the difficulty of the test to the capabilities of the person being tested. This address will first present information on the maximum level of adaptation that can occur and then compare the amount of adaptation that typically occurs on an operational adaptive test to the maximum level of adaptation. An index is proposed to summarize the amount of adaptation and it is argued that this type of index should be reported for operational adaptive tests to show the amount of adaptation that typically occurs.
Click for Presentation Video
10aAdaptive Testing10aCAT1 aReckase, Mark, D uhttps://drive.google.com/open?id=1Nj-zDCKk3DvHA4Jlp1qkb2XovmHeQfxu04649nas a2200145 4500008004100000245010200041210006900143260005500212520406800267653003004335653001604365653002204381100002904403856007104432 2017 eng d00aUsing Automated Item Generation in a Large-scale Medical Licensure Exam Program: Lessons Learned.0 aUsing Automated Item Generation in a Largescale Medical Licensur aNiigata, JapanbNiigata Seiryo Universityc08.20173 aOn-demand testing has become commonplace with most large-scale testing programs. Continuous testing is appealing for candidates in that it affords greater flexibility in scheduling a session at the desired location. Furthermore, the push for more comprehensive systems of assessment (e.g. CBAL) is predicated on the availability of more frequently administered tasks given the purposeful link between instruction and assessment in these frameworks. However, continuous testing models impose several challenges to programs, including overexposure of items. Robust item banks are therefore needed to support routine retirement and replenishment of items. In a traditional approach to developing items, content experts select a topic and then develop an item consisting of a stem, lead-in question, a correct answer and list of distractors. The item then undergoes review by a panel of experts to validate the content and identify any potential flaws. The process involved in developing quality MCQ items can be time-consuming as well as costly, with estimates as high as $1500-$2500 USD per item (Rudner, 2010). The Medical Council of Canada (MCC) has been exploring a novel item development process to supplement traditional approaches. Specifically, the use of automated item generation (AIG), which uses technology to generate test items from cognitive models, has been studied for over five years. Cognitive models are representations of the knowledge and skills that are required to solve any given problem. While developing a cognitive model for a medical scenario, for example, content experts are asked to deconstruct the (clinical) reasoning process involved via clearly stated variables and related elements. The latter information is then entered into a computer program that uses algorithms to generate MCQs. The MCC has been piloting AIG –based items for over five years with the MCC Qualifying Examination Part I (MCCQE I), a pre-requisite for licensure in Canada. The aim of this presentation is to provide an overview of the practical lessons learned in the use and operational rollout of AIG with the MCCQE I. Psychometrically, the quality of the items is at least equal, and in many instances superior, to that of traditionally written MCQs, based on difficulty, discrimination, and information. In fact, 96% of the AIG based items piloted in a recent administration were retained for future operational scoring based on pre-defined inclusion criteria. AIG also offers a framework for the systematic creation of plausible distractors, in that the content experts not only need to provide the clinical reasoning underlying a correct response but also the cognitive errors associated with each of the distractors (Lai et al. 2016). Consequently, AIG holds great promise in regard to improving and tailoring diagnostic feedback for remedial purposes (Pugh, De Champlain, Gierl, Lai, Touchie, 2016). Furthermore, our test development process has been greatly enhanced by the addition of AIG as it requires that item writers use metacognitive skills to describe how they solve problems. We are hopeful that sharing our experiences with attendees might not only help other testing organizations interested in adopting AIG, but also foster discussion which might benefit all participants.
References
Lai, H., Gierl, M.J., Touchie, C., Pugh, D., Boulais, A.P., & De Champlain, A.F. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching and Learning in Medicine, 28, 166-173.
Pugh, D., De Champlain, A.F., Lai, H., Gierl, M., & Touchie, C. (2016). Using cognitive models to develop quality multiple choice questions. Medical Teacher, 38, 838-843.
Rudner, L. (2010). Implementing the Graduate Management Admission Test Computerized Adaptive Test. In W. van der Linden & C. Glass (Eds.), Elements of adaptive testing (pp. 151-165). New York, NY: Springer.
Presentation Video
10aAutomated item generation10alarge scale10amedical licensure1 aDe Champlain, André, F. uhttps://drive.google.com/open?id=14N8hUc8qexAy5W_94TykEDABGVIJHG1h00723nas a2200205 4500008004500000022001500045245013200060210006900192300000900261490000600270653002100276653003000297653003000327653001600357653002300373100002100396700002000417700002000437856006000457 2016 Engldsh a2165-6592 00aEffect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context 0 aEffect of Imprecise Parameter Estimation on Ability Estimation i a1-180 v410aAdaptive Testing10aautomatic item generation10aerrors in item parameters10aitem clones10amultistage testing1 aColvin, Kimberly1 aKeller, Lisa, A1 aRobin, Frederic uhttp://iacat.org/jcat/index.php/jcat/article/view/59/2700516nas a2200193 4500008004500000022001400045245004500059210004200104300000900146490000600155653001300161653001500174653001300189653001200202653001100214653001200225100002100237856006400258 2015 Engldsh a2165-659200aImplementing a CAT: The AMC Experience 0 aImplementing a CAT The AMC Experience a1-120 v310aadaptive10aAssessment10acomputer10amedical10aonline10aTesting1 aBarnard, John, J uhttp://www.iacat.org/jcat/index.php/jcat/article/view/52/2500495nas a2200121 4500008004100000245009500041210006900136653001800205653000800223653000900231100001900240856011400259 2011 eng d00aBuilding Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment0 aBuilding Affordable CDCAT Systems for Schools To Address Todays 10aaffordability10aCAT10acost1 aChang, Hua-Hua uhttp://mail.iacat.org/content/building-affordable-cd-cat-systems-schools-address-todays-challenges-assessment01252nas a2200181 4500008004100000245012100041210006900162260001200231520055600243653003300799653000800832653001900840653003500859100001800894700001600912700002200928856012000950 2011 eng d00aItem Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels0 aItem Selection Methods based on Multiple Objective Approaches fo c10/20113 aIs it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.
Conclusions
-
Sequential Classification Tests higher ATL than Adaptive Classification Tests
-
Sequential Classification Tests slightly lower PCD than Adaptive Classification Tests
-
Results also hold with three and four cutting points
10aadaptive classification test10aCAT10aitem selection10asequential classification test1 aGroen, Maaike1 aEggen, Theo1 aVeldkamp, Bernard uhttp://mail.iacat.org/content/item-selection-methods-based-multiple-objective-approaches-classification-respondents02121nas a2200193 4500008004100000245007900041210006900120260001200189520146800201653002801669653000801697653001801705100002001723700001701743700002301760700002301783700002301806856009801829 2011 eng d00aThe Use of Decision Trees for Adaptive Item Selection and Score Estimation0 aUse of Decision Trees for Adaptive Item Selection and Score Esti c10/20113 aConducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.
-
Measure: Global Appraisal of Individual Needs (GAIN) Substance Problem Scale (16 items)
-
Past-year symptom count (SPSy)
-
Recency of symptom scale (SPSr)
Conclusions
Decision tree methods were more efficient than CAT
-
CART for dichotomous items (SPSy)
-
CHAID for polytomous items (SPSr)
-
Score bias was low in all conditions, particularly for decision trees using dichotomous items
-
In early stages of administration, decision trees provided slightly higher correlations with the full scale and lower RMSE values.
But,...
-
CAT outperformed decision tree methods in later stages of administration.
-
CAT also outperformed decision trees with respect to sensitivity to group differences as measured by effect size.
Conclusions
CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination
Decision Trees select items that best discriminate between groups defined by the total score.
CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.
10aadaptive item selection10aCAT10adecision tree1 aRiley, Barth, B1 aFunk, Rodney1 aDennis, Michael, L1 aLennox, Richard, D1 aFinkelman, Matthew uhttp://mail.iacat.org/content/use-decision-trees-adaptive-item-selection-and-score-estimation03104nas a2200445 4500008004100000020004100041245012000082210006900202250001500271260001000286300001100296490000700307520175400314653003802068653002102106653001002127653000902137653002202146653002802168653003302196653001102229653001102240653000902251653001602260653001802276653001902294653003102313653003102344653001602375100001602391700001002407700001402417700001502431700001402446700001502460700001802475700002402493700001802517856012302535 2010 eng d a0161-8105 (Print)0161-8105 (Linking)00aDevelopment and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments0 aDevelopment and validation of patientreported outcome measures f a2010/06/17 cJun 1 a781-920 v333 aSTUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions.10a*Outcome Assessment (Health Care)10a*Self Disclosure10aAdult10aAged10aAged, 80 and over10aCross-Sectional Studies10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aPsychometrics10aQuestionnaires10aReproducibility of Results10aSleep Disorders/*diagnosis10aYoung Adult1 aBuysse, D J1 aYu, L1 aMoul, D E1 aGermain, A1 aStover, A1 aDodds, N E1 aJohnston, K L1 aShablesky-Cade, M A1 aPilkonis, P A uhttp://mail.iacat.org/content/development-and-validation-patient-reported-outcome-measures-sleep-disturbance-and-sleep02652nas a2200181 4500008004100000020001400041245007900055210006900134300001000203490000700213520201300220653005202233653004002285100001502325700001302340700001502353856010202368 2009 eng d a0360-131500aAn adaptive testing system for supporting versatile educational assessment0 aadaptive testing system for supporting versatile educational ass a53-670 v523 aWith the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions.10aArchitectures for educational technology system10aDistance education and telelearning1 aHuang, Y-M1 aLin, Y-T1 aCheng, S-C uhttp://mail.iacat.org/content/adaptive-testing-system-supporting-versatile-educational-assessment02750nas a2200409 4500008004100000020004600041245009400087210006900181250001500250260000800265300001200273490000700285520144500292653001501737653002001752653003101772653003001803653002001833653001901853653002601872653001101898653001101909653000901920653001601929653002601945653003701971653003002008653004402038653001802082653002002100653002802120100002002148700002302168700001602191700001702207856011602224 2009 eng d a1528-8447 (Electronic)1526-5900 (Linking)00aDevelopment and preliminary testing of a computerized adaptive assessment of chronic pain0 aDevelopment and preliminary testing of a computerized adaptive a a2009/07/15 cSep a932-9430 v103 aThe aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.10a*Computers10a*Questionnaires10aActivities of Daily Living10aAdaptation, Psychological10aChronic Disease10aCohort Studies10aDisability Evaluation10aFemale10aHumans10aMale10aMiddle Aged10aModels, Psychological10aOutcome Assessment (Health Care)10aPain Measurement/*methods10aPain, Intractable/*diagnosis/psychology10aPsychometrics10aQuality of Life10aUser-Computer Interface1 aAnatchkova, M D1 aSaris-Baglama, R N1 aKosinski, M1 aBjorner, J B uhttp://mail.iacat.org/content/development-and-preliminary-testing-computerized-adaptive-assessment-chronic-pain02882nas a2200493 4500008004100000020004100041245014100082210006900223250001500292260000800307300001100315490000700326520125100333653003001584653001001614653000901624653004601633653003301679653001101712653003101723653001101754653000901765653003301774653001601807653002401823653004601847653005501893653005501948653004602003653001902049653003102068653001402099100001602113700001502129700001302144700001402157700001502171700001702186700001502203700001702218700001502235700001302250856012502263 2009 eng d a0090-5550 (Print)0090-5550 (Linking)00aDevelopment of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis0 aDevelopment of an item bank for the assessment of depression in a2009/05/28 cMay a186-970 v543 aOBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings.10aAdaptation, Psychological10aAdult10aAged10aDepressive Disorder/*diagnosis/psychology10aDiagnosis, Computer-Assisted10aFemale10aHeart Diseases/*psychology10aHumans10aMale10aMental Disorders/*psychology10aMiddle Aged10aModels, Statistical10aOtorhinolaryngologic Diseases/*psychology10aPersonality Assessment/statistics & numerical data10aPersonality Inventory/*statistics & numerical data10aPsychometrics/statistics & numerical data10aQuestionnaires10aReproducibility of Results10aSick Role1 aForkmann, T1 aBoecker, M1 aNorra, C1 aEberle, N1 aKircher, T1 aSchauerte, P1 aMischke, K1 aWesthofen, M1 aGauggel, S1 aWirtz, M uhttp://mail.iacat.org/content/development-item-bank-assessment-depression-persons-mental-illnesses-and-physical-diseases02752nas a2200433 4500008004100000020004600041245012800087210006900215250001500284300001200299490000700311520139300318653003401711653001501745653001001760653000901770653002201779653002501801653001101826653001101837653000901848653001601857653001501873653003801888653001901926653003101945653002801976653004802004653002202052100002002074700001202094700001402106700001602120700001402136700001702150700001502167700001502182856012102197 2009 eng d a1878-5921 (Electronic)0895-4356 (Linking)00aAn evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception0 aevaluation of patientreported outcomes found computerized adapti a2008/07/22 a278-2870 v623 aOBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making.10a*Diagnosis, Computer-Assisted10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFemale10aHumans10aMale10aMiddle Aged10aPerception10aQuality of Health Care/*standards10aQuestionnaires10aReproducibility of Results10aSickness Impact Profile10aStress, Psychological/*diagnosis/psychology10aTreatment Outcome1 aKocalevent, R D1 aRose, M1 aBecker, J1 aWalter, O B1 aFliege, H1 aBjorner, J B1 aKleiber, D1 aKlapp, B F uhttp://mail.iacat.org/content/evaluation-patient-reported-outcomes-found-computerized-adaptive-testing-was-efficient01655nas a2200289 4500008004100000020004100041245011100082210006900193250001500262260000800277300001100285490000700296520053700303653004800840653006200888653005700950653001101007653002701018653002401045653005101069653004701120653003101167653001301198100001301211700001901224856012201243 2009 eng d a0007-1102 (Print)0007-1102 (Linking)00aThe maximum priority index method for severely constrained item selection in computerized adaptive testing0 amaximum priority index method for severely constrained item sele a2008/06/07 cMay a369-830 v623 aThis paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision.10aAptitude Tests/*statistics & numerical data10aDiagnosis, Computer-Assisted/*statistics & numerical data10aEducational Measurement/*statistics & numerical data10aHumans10aMathematical Computing10aModels, Statistical10aPersonality Tests/*statistics & numerical data10aPsychometrics/*statistics & numerical data10aReproducibility of Results10aSoftware1 aCheng, Y1 aChang, Hua-Hua uhttp://mail.iacat.org/content/maximum-priority-index-method-severely-constrained-item-selection-computerized-adaptive03148nas a2200457 4500008004100000020004100041245015500082210006900237250001500306260000800321300001200329490000700341520174100348653002502089653001902114653002502133653003002158653001502188653003602203653001002239653002102249653003302270653001102303653001102314653000902325653001802334653001702352653001902369653001602388100001502404700001002419700001502429700002502444700001802469700001702487700001602504700001602520700001402536700001602550856012402566 2009 eng d a0962-9343 (Print)0962-9343 (Linking)00aMeasuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing0 aMeasuring global physical health in children with cerebral palsy a2009/02/18 cApr a359-3700 v183 aPURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner.10a*Computer Simulation10a*Health Status10a*Models, Statistical10aAdaptation, Psychological10aAdolescent10aCerebral Palsy/*physiopathology10aChild10aChild, Preschool10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMassachusetts10aPennsylvania10aQuestionnaires10aYoung Adult1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A1 aHambleton, RK1 aMontpetit, K1 aBilodeau, N1 aGorton, G E1 aWatson, K1 aTucker, C A uhttp://mail.iacat.org/content/measuring-global-physical-health-children-cerebral-palsy-illustration-multidimensional-bi02905nas a2200289 4500008004100000020004100041245011100082210006900193250001500262260000800277300001400285490000700299520193300306653002702239653003802266653004102304653001902345653001102364653001402375653003102389100001502420700001302435700001202448700001602460700001302476856012602489 2009 eng d a0315-162X (Print)0315-162X (Linking)00aProgress in assessing physical function in arthritis: PROMIS short forms and computerized adaptive testing0 aProgress in assessing physical function in arthritis PROMIS shor a2009/09/10 cSep a2061-20660 v363 aOBJECTIVE: Assessing self-reported physical function/disability with the Health Assessment Questionnaire Disability Index (HAQ) and other instruments has become central in arthritis research. Item response theory (IRT) and computerized adaptive testing (CAT) techniques can increase reliability and statistical power. IRT-based instruments can improve measurement precision substantially over a wider range of disease severity. These modern methods were applied and the magnitude of improvement was estimated. METHODS: A 199-item physical function/disability item bank was developed by distilling 1865 items to 124, including Legacy Health Assessment Questionnaire (HAQ) and Physical Function-10 items, and improving precision through qualitative and quantitative evaluation in over 21,000 subjects, which included about 1500 patients with rheumatoid arthritis and osteoarthritis. Four new instruments, (A) Patient-Reported Outcomes Measurement Information (PROMIS) HAQ, which evolved from the original (Legacy) HAQ; (B) "best" PROMIS 10; (C) 20-item static (short) forms; and (D) simulated PROMIS CAT, which sequentially selected the most informative item, were compared with the HAQ. RESULTS: Online and mailed administration modes yielded similar item and domain scores. The HAQ and PROMIS HAQ 20-item scales yielded greater information content versus other scales in patients with more severe disease. The "best" PROMIS 20-item scale outperformed the other 20-item static forms over a broad range of 4 standard deviations. The 10-item simulated PROMIS CAT outperformed all other forms. CONCLUSION: Improved items and instruments yielded better information. The PROMIS HAQ is currently available and considered validated. The new PROMIS short forms, after validation, are likely to represent further improvement. CAT-based physical function/disability assessment offers superior performance over static forms of equal length.10a*Disability Evaluation10a*Outcome Assessment (Health Care)10aArthritis/diagnosis/*physiopathology10aHealth Surveys10aHumans10aPrognosis10aReproducibility of Results1 aFries, J F1 aCella, D1 aRose, M1 aKrishnan, E1 aBruce, B uhttp://mail.iacat.org/content/progress-assessing-physical-function-arthritis-promis-short-forms-and-computerized-adaptive02436nas a2200385 4500008004100000020004100041245009300082210006900175250001500244260000800259300001100267490000700278520128100285653003201566653002701598653002001625653002901645653001001674653000901684653001901693653003401712653001101746653001101757653000901768653001601777653004601793100001501839700001001854700001501864700001101879700001201890700001401902700001601916856011801932 2009 eng d a0962-9343 (Print)0962-9343 (Linking)00aReplenishing a computerized adaptive test of patient-reported daily activity functioning0 aReplenishing a computerized adaptive test of patientreported dai a2009/03/17 cMay a461-710 v183 aPURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT.10a*Activities of Daily Living10a*Disability Evaluation10a*Questionnaires10a*User-Computer Interface10aAdult10aAged10aCohort Studies10aComputer-Assisted Instruction10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods1 aHaley, S M1 aNi, P1 aJette, A M1 aTao, W1 aMoed, R1 aMeyers, D1 aLudlow, L H uhttp://mail.iacat.org/content/replenishing-computerized-adaptive-test-patient-reported-daily-activity-functioning03437nas a2200481 4500008004100000020004600041245013800087210006900225250001500294260000800309300001200317490000700329520191400336653002702250653002302277653003102300653001502331653001602346653001002362653002102372653002402393653002302417653003802440653001102478653002202489653001102511653001102522653000902533653003702542653002102579653003102600653002602631653001702657653003202674653001602706653002802722100001602750700001502766700001002781700001502791700002502806856012402831 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aAssessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing selfcare and social function using a computer adaptive a2008/04/01 cApr a622-6290 v893 aOBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time.10a*Disability Evaluation10a*Social Adjustment10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild, Preschool10aComputer Simulation10aCross-Over Studies10aDisabled Children/*rehabilitation10aFemale10aFollow-Up Studies10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)10aReference Values10aReproducibility of Results10aRetrospective Studies10aRisk Factors10aSelf Care/*standards/trends10aSex Factors10aSickness Impact Profile1 aCoster, W J1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A uhttp://mail.iacat.org/content/assessing-self-care-and-social-function-using-computer-adaptive-testing-version-pediatric03042nas a2200481 4500008004100000020004600041245012200087210006900209250001500278260000800293300001200301490000700313520155700320653003201877653003101909653002201940653002001962653001001982653000901992653002202001653002802023653003302051653001102084653001102095653002502106653000902131653001602140653004602156653002202202653002402224653003002248653002902278100001502307700001402322700001502336700002402351700001802375700001102393700001602404700001002420700001502430856011502445 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes0 aComputerized adaptive testing for followup after discharge from a2008/01/30 cFeb a275-2830 v893 aOBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aGandek, B1 aSiebens, H1 aBlack-Schaffer, R M1 aSinclair, S J1 aTao, W1 aCoster, W J1 aNi, P1 aJette, A M uhttp://mail.iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-ii03314nas a2200433 4500008004100000020004600041245007700087210006900164250001500233260001100248300001200259490000700271520203200278653002702310653003002337653002102367653001002388653000902398653001502407653003602422653002102458653004402479653002402523653001102547653001102558653001302569653000902582653001602591653003002607653003002637653003102667100001502698700001302713700001502726700001402741700001502755700001402770856009602784 2008 eng d a1528-1159 (Electronic)0362-2436 (Linking)00aComputerized adaptive testing in back pain: Validation of the CAT-5D-QOL0 aComputerized adaptive testing in back pain Validation of the CAT a2008/05/23 cMay 20 a1384-900 v333 aSTUDY DESIGN: We have conducted an outcome instrument validation study. OBJECTIVE: Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. SUMMARY OF BACKGROUND DATA: Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). METHODS: An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. RESULTS: A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. CONCLUSION: The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden.10a*Disability Evaluation10a*Health Status Indicators10a*Quality of Life10aAdult10aAged10aAlgorithms10aBack Pain/*diagnosis/psychology10aBritish Columbia10aDiagnosis, Computer-Assisted/*standards10aFeasibility Studies10aFemale10aHumans10aInternet10aMale10aMiddle Aged10aPredictive Value of Tests10aQuestionnaires/*standards10aReproducibility of Results1 aKopec, J A1 aBadii, M1 aMcKenna, M1 aLima, V D1 aSayre, E C1 aDvorak, M uhttp://mail.iacat.org/content/computerized-adaptive-testing-back-pain-validation-cat-5d-qol01877nas a2200205 4500008003900000245005600039210005600095300001000151490000800161520124900169653002101418653003001439653002501469653001801494653002501512100001301537700001701550700002101567856008301588 2008 d00aComputerized Adaptive Testing of Personality Traits0 aComputerized Adaptive Testing of Personality Traits a12-210 v2163 aA computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).
10aAdaptive Testing10acmoputer-assisted testing10aItem Response Theory10aLikert scales10aPersonality Measures1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://mail.iacat.org/content/computerized-adaptive-testing-personality-traits03233nas a2200397 4500008004100000020002700041245014200068210006900210250001500279260001100294300001200305490000700317520193600324653002702260653003002287653001002317653000902327653002202336653003602358653001602394653002402410653004402434653001102478653001602489653002602505653003002531653003002561653003102591100001302622700001402635700001502649700001402664700001702678700001502695856012502710 2008 eng d a1528-1159 (Electronic)00aLetting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire0 aLetting the CAT out of the bag Comparing computer adaptive tests a2008/05/23 cMay 20 a1378-830 v333 aSTUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages.10a*Disability Evaluation10a*Health Status Indicators10aAdult10aAged10aAged, 80 and over10aBack Pain/*diagnosis/psychology10aCalibration10aComputer Simulation10aDiagnosis, Computer-Assisted/*standards10aHumans10aMiddle Aged10aModels, Psychological10aPredictive Value of Tests10aQuestionnaires/*standards10aReproducibility of Results1 aCook, KF1 aChoi, S W1 aCrane, P K1 aDeyo, R A1 aJohnson, K L1 aAmtmann, D uhttp://mail.iacat.org/content/letting-cat-out-bag-comparing-computer-adaptive-tests-and-11-item-short-form-roland-morris03429nas a2200385 4500008004100000020004100041245010600082210006900188250001500257260001200272300001000284490000700294520220300301653002702504653001502531653001002546653002102556653002402577653002802601653003802629653001102667653001102678653001102689653003902700653000902739653002402748653003102772653004002803100001802843700001502861700001302876700001702889700001402906856012302920 2008 eng d a0271-6798 (Print)0271-6798 (Linking)00aMeasuring physical functioning in children with spinal impairments with computerized adaptive testing0 aMeasuring physical functioning in children with spinal impairmen a2008/03/26 cApr-May a330-50 v283 aBACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice.10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aComputer Simulation10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aKyphosis/*diagnosis/rehabilitation10aMale10aProspective Studies10aReproducibility of Results10aScoliosis/*diagnosis/rehabilitation1 aMulcahey, M J1 aHaley, S M1 aDuffy, T1 aPengsheng, N1 aBetz, R R uhttp://mail.iacat.org/content/measuring-physical-functioning-children-spinal-impairments-computerized-adaptive-testing01948nas a2200277 4500008004100000020004100041245007300082210006900155250001500224260000800239300001000247490000700257520099700264653001601261653002901277653004801306653006201354653001101416653002401427653004601451653003101497653001301528100001401541700001501555856010001570 2008 eng d a0007-1102 (Print)0007-1102 (Linking)00aPredicting item exposure parameters in computerized adaptive testing0 aPredicting item exposure parameters in computerized adaptive tes a2008/05/17 cMay a75-910 v613 aThe purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters.10a*Algorithms10a*Artificial Intelligence10aAptitude Tests/*statistics & numerical data10aDiagnosis, Computer-Assisted/*statistics & numerical data10aHumans10aModels, Statistical10aPsychometrics/statistics & numerical data10aReproducibility of Results10aSoftware1 aChen, S-Y1 aDoong, S H uhttp://mail.iacat.org/content/predicting-item-exposure-parameters-computerized-adaptive-testing02175nas a2200301 4500008004100000020001400041245010200055210006900157250001500226300001200241490000700253520109000260653001501350653001501365653002101380653004801401653002401449653002801473653006201501653005701563653001101620653002701631653004601658100001701704700001201721700001401733856012601747 2008 eng d a1138-741600aRotating item banks versus restriction of maximum exposure rates in computerized adaptive testing0 aRotating item banks versus restriction of maximum exposure rates a2008/11/08 a618-6250 v113 aIf examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.
10a*Character10a*Databases10a*Software Design10aAptitude Tests/*statistics & numerical data10aBias (Epidemiology)10aComputing Methodologies10aDiagnosis, Computer-Assisted/*statistics & numerical data10aEducational Measurement/*statistics & numerical data10aHumans10aMathematical Computing10aPsychometrics/statistics & numerical data1 aBarrada, J R1 aOlea, J1 aAbad, F J uhttp://mail.iacat.org/content/rotating-item-banks-versus-restriction-maximum-exposure-rates-computerized-adaptive-testing03158nas a2200493 4500008004100000020002200041245008900063210006900152250001500221260000800236300001000244490000700254520169600261653003401957653002001991653001502011653001002026653000902036653002602045653003202071653003102103653001102134653001102145653000902156653003202165653001602197653002902213653004402242653002902286653003102315653003102346653001702377100001702394700001402411700001602425700001302441700001702454700002202471700001702493700001402510700001402524700001702538856010902555 2008 eng d a1075-2730 (Print)00aUsing computerized adaptive testing to reduce the burden of mental health assessment0 aUsing computerized adaptive testing to reduce the burden of ment a2008/04/02 cApr a361-80 v593 aOBJECTIVE: This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments. METHODS: Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing. RESULTS: Tests of competing models based on item response theory supported the scale's bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients--one with bipolar disorder and one without--on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT. CONCLUSIONS: Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.10a*Diagnosis, Computer-Assisted10a*Questionnaires10aAdolescent10aAdult10aAged10aAgoraphobia/diagnosis10aAnxiety Disorders/diagnosis10aBipolar Disorder/diagnosis10aFemale10aHumans10aMale10aMental Disorders/*diagnosis10aMiddle Aged10aMood Disorders/diagnosis10aObsessive-Compulsive Disorder/diagnosis10aPanic Disorder/diagnosis10aPhobic Disorders/diagnosis10aReproducibility of Results10aTime Factors1 aGibbons, R D1 aWeiss, DJ1 aKupfer, D J1 aFrank, E1 aFagiolini, A1 aGrochocinski, V J1 aBhaumik, D K1 aStover, A1 aBock, R D1 aImmekus, J C uhttp://mail.iacat.org/content/using-computerized-adaptive-testing-reduce-burden-mental-health-assessment02309nas a2200301 4500008004100000020002200041245011900063210006900182250001500251260000800266300001000274490000700284520125000291653001501541653001001556653006201566653001101628653001101639653000901650653003801659653005601697653004601753653002101799653003101820100001601851700002001867856012001887 2007 eng d a1040-3590 (Print)00aComputerized adaptive personality testing: A review and illustration with the MMPI-2 Computerized Adaptive Version0 aComputerized adaptive personality testing A review and illustrat a2007/03/21 cMar a14-240 v193 aComputerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors review the literature on each and report the results of an investigation designed to explore the utility, in terms of item and time savings, and validity, in terms of correlations with external criterion measures, of an expanded countdown method-based research version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), the MMPI-2 Computerized Adaptive Version (MMPI-2-CA). Participants were 433 undergraduate college students (170 men and 263 women). Results indicated considerable item savings and corresponding time savings for the adaptive testing modalities compared with a conventional computerized MMPI-2 administration. Furthermore, computerized adaptive administration yielded comparable results to computerized conventional administration of the MMPI-2 in terms of both test scores and their validity. Future directions for computerized adaptive personality testing are discussed.10aAdolescent10aAdult10aDiagnosis, Computer-Assisted/*statistics & numerical data10aFemale10aHumans10aMale10aMMPI/*statistics & numerical data10aPersonality Assessment/*statistics & numerical data10aPsychometrics/statistics & numerical data10aReference Values10aReproducibility of Results1 aForbey, J D1 aBen-Porath, Y S uhttp://mail.iacat.org/content/computerized-adaptive-personality-testing-review-and-illustration-mmpi-2-computerized01949nas a2200301 4500008004500000020001400045245012900059210006900188300001200257490000700269520093500276653002501211653002101236653002501257653003001282653003001312653001001342653001501352653002601367653002501393653002401418653001501442653001501457100001301472700001701485700002101502856012401523 2007 Engldsh a0146-621600aComputerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms0 aComputerized adaptive testing for polytomous motivation items Ad a412-4290 v313 aIn a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract)10a2220 Tests & Testing10aAdaptive Testing10aAttitude Measurement10acomputer adaptive testing10aComputer Assisted Testing10aitems10aMotivation10apolytomous motivation10aStatistical Validity10aTest Administration10aTest Forms10aTest Items1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://mail.iacat.org/content/computerized-adaptive-testing-polytomous-motivation-items-administration-mode-effects-and02419nas a2200325 4500008004100000020002200041245008700063210006900150250001500219300001100234490000700245520140100252653001901653653003001672653001901702653003801721653002101759653002001780653001401800653001501814653003301829653001101862653002401873653001801897100001701915700001501932700001501947700001501962856011601977 2007 eng d a0962-9343 (Print)00aDeveloping tailored instruments: item banking and computerized adaptive assessment0 aDeveloping tailored instruments item banking and computerized ad a2007/05/29 a95-1080 v163 aItem banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges.10a*Health Status10a*Health Status Indicators10a*Mental Health10a*Outcome Assessment (Health Care)10a*Quality of Life10a*Questionnaires10a*Software10aAlgorithms10aFactor Analysis, Statistical10aHumans10aModels, Statistical10aPsychometrics1 aBjorner, J B1 aChang, C-H1 aThissen, D1 aReeve, B B uhttp://mail.iacat.org/content/developing-tailored-instruments-item-banking-and-computerized-adaptive-assessment01799nas a2200217 4500008004100000020004600041245017800087210006900265260002500334300001200359490000700371520094900378653001201327653004301339653001801382653000901400100001701409700001501426700001501441856012501456 2007 eng d a1062-7197 (Print); 1532-6977 (Electronic)00aThe effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items?0 aeffect of including pretest items in an operational computerized bLawrence Erlbaum: US a161-1730 v123 aThe purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aability10aoperational computerized adaptive test10apretest items10atime1 aFerdous, A A1 aPlake, B S1 aChang, S-R uhttp://mail.iacat.org/content/effect-including-pretest-items-operational-computerized-adaptive-test-do-different-ability02745nas a2200541 4500008004100000020002200041245017000063210006900233250001500302260000800317300001100325490000700336520116200343653001901505653002501524653002101549653002101570653001501591653001001606653000901616653001601625653002301641653003201664653001101696653001101707653000901718653001601727653004601743653001801789653002901807653001801836100001501854700001401869700001701883700001301900700001501913700001601928700001501944700001701959700001401976700001801990700001102008700001602019700001502035700001302050700001302063856012702076 2007 eng d a0025-7079 (Print)00aPsychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)0 aPsychometric evaluation and calibration of healthrelated quality a2007/04/20 cMay aS22-310 v453 aBACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.10a*Health Status10a*Information Systems10a*Quality of Life10a*Self Disclosure10aAdolescent10aAdult10aAged10aCalibration10aDatabases as Topic10aEvaluation Studies as Topic10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPsychometrics10aQuestionnaires/standards10aUnited States1 aReeve, B B1 aHays, R D1 aBjorner, J B1 aCook, KF1 aCrane, P K1 aTeresi, J A1 aThissen, D1 aRevicki, D A1 aWeiss, DJ1 aHambleton, RK1 aLiu, H1 aGershon, RC1 aReise, S P1 aLai, J S1 aCella, D uhttp://mail.iacat.org/content/psychometric-evaluation-and-calibration-health-related-quality-life-item-banks-plans-patient02177nas a2200181 4500008004100000020002200041245006200063210006200125260003800187300001200225490000700237520155900244653002901803653003401832653001801866100002201884856008901906 2006 eng d a0033-3018 (Print)00aAdaptive success control in computerized adaptive testing0 aAdaptive success control in computerized adaptive testing bPabst Science Publishers: Germany a436-4500 v483 aIn computerized adaptive testing (CAT) procedures within the framework of probabilistic test theory the difficulty of an item is adjusted to the ability of the respondent, with the aim of maximizing the amount of information generated per item, thereby also increasing test economy and test reasonableness. However, earlier research indicates that respondents might feel over-challenged by a constant success probability of p = 0.5 and therefore cannot come to a sufficiently high answer certainty within a reasonable timeframe. Consequently response time per item increases, which -- depending on the test material -- can outweigh the benefit of administering optimally informative items. Instead of a benefit, the result of using CAT procedures could be a loss of test economy. Based on this problem, an adaptive success control algorithm was designed and tested, adapting the success probability to the working style of the respondent. Persons who need higher answer certainty in order to come to a decision are detected and receive a higher success probability, in order to minimize the test duration (not the number of items as in classical CAT). The method is validated on the re-analysis of data from the Adaptive Matrices Test (AMT, Hornke, Etzel & Rettig, 1999) and by the comparison between an AMT version using classical CAT and an experimental version using Adaptive Success Control. The results are discussed in the light of psychometric and psychological aspects of test quality. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive success control10acomputerized adaptive testing10aPsychometrics1 aHäusler, Joachim uhttp://mail.iacat.org/content/adaptive-success-control-computerized-adaptive-testing01638nas a2200217 4500008004100000020004600041245008900087210006900176260002500245300000900270490000700279520083400286653001901120653002801139653003001167653003301197653002101230653003501251100001801286856011601304 2006 eng d a0895-7347 (Print); 1532-4818 (Electronic)00aApplying Bayesian item selection approaches to adaptive tests using polytomous items0 aApplying Bayesian item selection approaches to adaptive tests us bLawrence Erlbaum: US a1-200 v193 aThis study applied the maximum expected information (MEI) and the maximum posterior- weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability estimation using the MEI and MPI approaches to the traditional maximal item information (MII) approach. The results of the simulation study indicated that the MEI and MPI approaches led to a superior efficiency of ability estimation compared with the MII approach. The superiority of the MEI and MPI approaches over the MII approach was greatest when the bank contained items having a relatively peaked information function. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive tests10aBayesian item selection10acomputer adaptive testing10amaximum expected information10apolytomous items10aposterior weighted information1 aPenfield, R D uhttp://mail.iacat.org/content/applying-bayesian-item-selection-approaches-adaptive-tests-using-polytomous-items01964nas a2200265 4500008004100000020002200041245008200063210006900145260002600214300001000240490000700250520112900257653001501386653003401401653001401435653001701449653002401466653001501490653002201505653001501527100002301542700001301565700001801578856010201596 2006 eng d a1076-9986 (Print)00aAssembling a computerized adaptive testing item pool as a set of linear tests0 aAssembling a computerized adaptive testing item pool as a set of bSage Publications: US a81-990 v313 aTest-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. 10aAlgorithms10acomputerized adaptive testing10aitem pool10alinear tests10amathematical models10astatistics10aTest Construction10aTest Items1 avan der Linden, WJ1 aAriel, A1 aVeldkamp, B P uhttp://mail.iacat.org/content/assembling-computerized-adaptive-testing-item-pool-set-linear-tests02653nas a2200397 4500008004100000020002200041245013500063210006900198250001500267260000800282300001200290490000700302520140700309653002601716653003101742653001501773653001001788653000901798653002201807653002501829653003301854653001101887653001101898653000901909653001601918653004601934653003001980653003102010653001302041100001502054700001002069700001802079700001602097700001502113856012702128 2006 eng d a0895-4356 (Print)00aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a2006/10/10 cNov a1174-820 v593 aBACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.10a*Recovery of Function10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aRehabilitation/*standards10aReproducibility of Results10aSoftware1 aHaley, S M1 aNi, P1 aHambleton, RK1 aSlavin, M D1 aJette, A M uhttp://mail.iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selectio-001471nas a2200205 4500008004100000245002700041210002600068260006000094300001100154490000800165520087300173653005101046653003001097653002001127653001801147653001301165100001501178700001501193856005701208 2006 eng d00aComputer-based testing0 aComputerbased testing aWashington D.C. USAbAmerican Psychological Association a87-1000 vxiv3 a(From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA )10aAdaptive Testing computerized adaptive testing10aComputer Assisted Testing10aExperimentation10aPsychometrics10aTheories1 aDrasgow, F1 aChuah, S C uhttp://mail.iacat.org/content/computer-based-testing03330nas a2200469 4500008004100000020002200041245011600063210006900179250001500248260000800263300001200271490000700283520189400290653003202184653003102216653002202247653002002269653001002289653000902299653002202308653002802330653003302358653001102391653001102402653002502413653000902438653001602447653004602463653002202509653002402531653003002555653002902585100001502614700001502629700001602644700001102660700002402671700001402695700001802709700001002727856012302737 2006 eng d a0003-9993 (Print)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes0 aComputerized adaptive testing for followup after discharge from a2006/08/01 cAug a1033-420 v873 aOBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aSiebens, H1 aCoster, W J1 aTao, W1 aBlack-Schaffer, R M1 aGandek, B1 aSinclair, S J1 aNi, P uhttp://mail.iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-i-activity03167nas a2200361 4500008004100000020002200041245013600063210006900199250001500268260000800283300001200291490000700303520206800310653001502378653002402393653002102417653001002438653000902448653002902457653003402486653002402520653001102544653001102555653001302566653000902579653001602588100001602604700001302620700002302633700001202656700001102668856012602679 2006 eng d a0962-9343 (Print)00aComputerized adaptive testing of diabetes impact: a feasibility study of Hispanics and non-Hispanics in an active clinic population0 aComputerized adaptive testing of diabetes impact a feasibility s a2006/10/13 cNov a1503-180 v153 aBACKGROUND: Diabetes is a leading cause of death and disability in the US and is twice as common among Hispanic Americans as non-Hispanics. The societal costs of diabetes provide an impetus for developing tools that can improve patient care and delay or prevent diabetes complications. METHODS: We implemented a feasibility study of a Computerized Adaptive Test (CAT) to measure diabetes impact using a sample of 103 English- and 97 Spanish-speaking patients (mean age = 56.5, 66.5% female) in a community medical center with a high proportion of minority patients (28% African-American). The 37 items of the Diabetes Impact Survey were translated using forward-backward translation and cognitive debriefing. Participants were randomized to receive either the full-length tool or the Diabetes-CAT first, in the patient's native language. RESULTS: The number of items and the amount of time to complete the survey for the CAT was reduced to one-sixth the amount for the full-length tool in both languages, across disease severity. Confirmatory Factor Analysis confirmed that the Diabetes Impact Survey is unidimensional. The Diabetes-CAT demonstrated acceptable internal consistency reliability, construct validity, and discriminant validity in the overall sample, although subgroup analyses suggested that the English sample data evidenced higher levels of reliability and validity than the Spanish sample and issues with discriminant validity in the Spanish sample. Differential Item Function analysis revealed differences in responses tendencies by language group in 3 of the 37 items. Participant interviews suggested that the Spanish-speaking patients generally preferred the paper survey to the computer-assisted tool, and were twice as likely to experience difficulties understanding the items. CONCLUSIONS: While the Diabetes-CAT demonstrated clear advantages in reducing respondent burden as compared to the full-length tool, simplifying the item bank will be necessary for enhancing the feasibility of the Diabetes-CAT for use with low literacy patients.10a*Computers10a*Hispanic Americans10a*Quality of Life10aAdult10aAged10aData Collection/*methods10aDiabetes Mellitus/*psychology10aFeasibility Studies10aFemale10aHumans10aLanguage10aMale10aMiddle Aged1 aSchwartz, C1 aWelch, G1 aSantiago-Kelley, P1 aBode, R1 aSun, X uhttp://mail.iacat.org/content/computerized-adaptive-testing-diabetes-impact-feasibility-study-hispanics-and-non-hispanics02161nas a2200289 4500008004100000245010000041210006900141260000800210300001200218490000700230520127700237653003401514653002101548653000901569653001201578653002201590653001101612653001101623653000901634653001601643653002901659653001901688100001301707700001501720700001301735856012301748 2006 eng d00aFactor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue0 aFactor analysis techniques for assessing sufficient unidimension cSep a1179-900 v153 aBACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF.10a*Factor Analysis, Statistical10a*Quality of Life10aAged10aChicago10aFatigue/*etiology10aFemale10aHumans10aMale10aMiddle Aged10aNeoplasms/*complications10aQuestionnaires1 aLai, J-S1 aCrane, P K1 aCella, D uhttp://mail.iacat.org/content/factor-analysis-techniques-assessing-sufficient-unidimensionality-cancer-related-fatigue03120nas a2200277 4500008004100000020002200041245010900063210006900172250001500241260000800256300001200264490000700276520221700283653002902500653002002529653002502549653002102574653001502595653002802610653001102638653002502649100001702674700001502691700001202706856012402718 2006 eng d a0214-9915 (Print)00aMaximum information stratification method for controlling item exposure in computerized adaptive testing0 aMaximum information stratification method for controlling item e a2007/02/14 cFeb a156-1590 v183 aThe proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters (a) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter (c), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.Método de estratificación por máxima información para el control de la exposición en tests adaptativos informatizados. La propuesta para aumentar la seguridad en los tests adaptativos informatizados que ha recibido más atención en los últimos años ha sido el método a-estratificado (AE - Chang y Ying, 1999): en los momentos iniciales del test sólo pueden administrarse ítems con bajos parámetros de discriminación (a), incrementándose los valores del parámetro a admisibles según avanza el test. Con este método la distribución de las tasas de exposición de los ítems es más equilibrada, manteniendo una adecuada precisión en la medida. El parámetro de pseudoadivinación (c), presente en el modelo logístico de tres parámetros, se supone irrelevante y no se incorpora en el AE. El método de Estratificación por Máxima Información (EMI) incorpora el parámetro c a la estratificación del banco y a la regla de selección de ítems, mejorando la precisión en comparación con AE, tanto para bancos donde los parámetros a y b correlacionan como para bancos donde no. Para ambos tipos de bancos, los métodos de bloqueo de b (Chang, Qian y Ying, 2001) mejoran la seguridad del banco.10a*Artificial Intelligence10a*Microcomputers10a*Psychological Tests10a*Software Design10aAlgorithms10aChi-Square Distribution10aHumans10aLikelihood Functions1 aBarrada, J R1 aMazuela, P1 aOlea, J uhttp://mail.iacat.org/content/maximum-information-stratification-method-controlling-item-exposure-computerized-adaptive02568nas a2200349 4500008004100000020002200041245016600063210006900229250001500298260000800313300001100321490000700332520142900339653002701768653001601795653001501811653001001826653002101836653001401857653005201871653001501923653001101938653001101949653003701960653001801997653001402015100001502029700001002044700001602054700002502070856012302095 2006 eng d a0003-9993 (Print)00aMeasurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory0 aMeasurement precision and efficiency of multidimensional compute a2006/08/29 cSep a1223-90 v873 aOBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired.10a*Disability Evaluation10a*Pediatrics10aAdolescent10aChild10aChild, Preschool10aComputers10aDisabled Persons/*classification/rehabilitation10aEfficiency10aHumans10aInfant10aOutcome Assessment (Health Care)10aPsychometrics10aSelf Care1 aHaley, S M1 aNi, P1 aLudlow, L H1 aFragala-Pinkham, M A uhttp://mail.iacat.org/content/measurement-precision-and-efficiency-multidimensional-computer-adaptive-testing-physical02406nas a2200337 4500008004100000020002200041245010800063210006900171250001500240260000800255300001100263490000700274520139300281653002101674653002101695653001001716653001101726653001801737653001101755653000901766653001601775653003001791653002801821100001801849700001701867700001801884700001401902700001701916700001701933856011801950 2006 eng d a0962-9343 (Print)00aMultidimensional computerized adaptive testing of the EORTC QLQ-C30: basic developments and evaluations0 aMultidimensional computerized adaptive testing of the EORTC QLQC a2006/03/21 cApr a315-290 v153 aOBJECTIVE: Self-report questionnaires are widely used to measure health-related quality of life (HRQOL). Ideally, such questionnaires should be adapted to the individual patient and at the same time scores should be directly comparable across patients. This may be achieved using computerized adaptive testing (CAT). Usually, CAT is carried out for a single domain at a time. However, many HRQOL domains are highly correlated. Multidimensional CAT may utilize these correlations to improve measurement efficiency. We investigated the possible advantages and difficulties of multidimensional CAT. STUDY DESIGN AND SETTING: We evaluated multidimensional CAT of three scales from the EORTC QLQ-C30: the physical functioning, emotional functioning, and fatigue scales. Analyses utilised a database with 2958 European cancer patients. RESULTS: It was possible to obtain scores for the three domains with five to seven items administered using multidimensional CAT that were very close to the scores obtained using all 12 items and with no or little loss of measurement precision. CONCLUSION: The findings suggest that multidimensional CAT may significantly improve measurement precision and efficiency and encourage further research into multidimensional CAT. Particularly, the estimation of the model underlying the multidimensional CAT and the conceptual aspects need further investigations.10a*Quality of Life10a*Self Disclosure10aAdult10aFemale10aHealth Status10aHumans10aMale10aMiddle Aged10aQuestionnaires/*standards10aUser-Computer Interface1 aPetersen, M A1 aGroenvold, M1 aAaronson, N K1 aFayers, P1 aSprangers, M1 aBjorner, J B uhttp://mail.iacat.org/content/multidimensional-computerized-adaptive-testing-eortc-qlq-c30-basic-developments-and02434nas a2200241 4500008004100000020004600041245008600087210006900173260002500242300001200267490000700279520159900286653001801885653002401903653002001927653002801947653002101975653004001996653001402036100001802050700001202068856011202080 2006 eng d a0895-7347 (Print); 1532-4818 (Electronic)00aOptimal and nonoptimal computer-based test designs for making pass-fail decisions0 aOptimal and nonoptimal computerbased test designs for making pas bLawrence Erlbaum: US a221-2390 v193 aNow that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass-fail decisions. The purpose of this study was to investigate the impact of optimal and nonoptimal multistage test (MST) designs, linear parallel-form test designs (LPFT), and computer adaptive test (CAT) designs on the decision consistency and accuracy of pass-fail decisions. Realistic testing situations matching those of one of the large credentialing agencies were simulated to increase the generalizability of the findings. The conclusions were clear: (a) With the LPFTs, matching test information functions (TIFs) to the mean of the proficiency distribution produced slightly better results than matching them to the passing score; (b) all of the test designs worked better than test construction using random selection of items, subject to content constraints only; (c) CAT performed better than the other test designs; and (d) if matching a TIP to the passing score, the MST design produced a bit better results than the LPFT design. If an argument for the MST design is to be made, it can be made on the basis of slight improvements over the LPFT design and better expected item bank utilization, candidate preference, and the potential for improved diagnostic feedback, compared with the feedback that is possible with fixed linear test forms. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive test10acredentialing exams10aDecision Making10aEducational Measurement10amultistage tests10aoptimal computer-based test designs10atest form1 aHambleton, RK1 aXing, D uhttp://mail.iacat.org/content/optimal-and-nonoptimal-computer-based-test-designs-making-pass-fail-decisions02654nas a2200409 4500008004100000245013400041210006900175300001000244490000700254520123100261653002501492653003201517653003101549653001001580653000901590653002201599653003301621653001101654653001101665653000901676653001601685653002401701653003101725653004101756653004501797653006801842653006101910653003001971653002802001653002202029100001402051700001302065700001802078700001402096700001502110856011902125 2006 eng d00aSimulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function0 aSimulated computerized adaptive test for patients with shoulder a290-80 v593 aBACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability.10a*Computer Simulation10a*Range of Motion, Articular10aActivities of Daily Living10aAdult10aAged10aAged, 80 and over10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aProspective Studies10aReproducibility of Results10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aShoulder Dislocation/*physiopathology/psychology/rehabilitation10aShoulder Pain/*physiopathology/psychology/rehabilitation10aShoulder/*physiopathology10aSickness Impact Profile10aTreatment Outcome1 aHart, D L1 aCook, KF1 aMioduski, J E1 aTeal, C R1 aCrane, P K uhttp://mail.iacat.org/content/simulated-computerized-adaptive-test-patients-shoulder-impairments-was-efficient-and02252nas a2200301 4500008004500000020001400045245009800059210007100157300001200228490000700240520127200247653003201519653002601551653002801577653001801605653002501623653001601648653001201664653001501676653001801691653001601709653001801725653002701743653001101770100001901781700001601800856013401816 2006 Spandsh a0212-972800aTécnicas para detectar patrones de respuesta atípicos [Aberrant patterns detection methods]0 aTécnicas para detectar patrones de respuesta atípicos Aberrant p a143-1540 v223 aLa identificación de patrones de respuesta atípicos es de gran utilidad para la construcción de tests y de bancos de ítems con propiedades psicométricas así como para el análisis de validez de los mismos. En este trabajo de revisión se han recogido los más relevantes y novedosos métodos de ajuste de personas que se han elaborado dentro de cada uno de los principales ámbitos de trabajo de la Psicometría: el escalograma de Guttman, la Teoría Clásica de Tests (TCT), la Teoría de la Generalizabilidad (TG), la Teoría de Respuesta al Ítem (TRI), los Modelos de Respuesta al Ítem No Paramétricos (MRINP), los Modelos de Clase Latente de Orden Restringido (MCL-OR) y el Análisis de Estructura de Covarianzas (AEC).Aberrant patterns detection has a great usefulness in order to make tests and item banks with psychometric characteristics and validity analysis of tests and items. The most relevant and newest person-fit methods have been reviewed. All of them have been made in each one of main areas of Psychometry: Guttman's scalogram, Classical Test Theory (CTT), Generalizability Theory (GT), Item Response Theory (IRT), Non-parametric Response Models (NPRM), Order-Restricted Latent Class Models (OR-LCM) and Covariance Structure Analysis (CSA).10aaberrant patterns detection10aClassical Test Theory10ageneralizability theory10aItem Response10aItem Response Theory10aMathematics10amethods10aperson-fit10aPsychometrics10apsychometry10aTest Validity10atest validity analysis10aTheory1 aNúñez, R M N1 aPina, J A L uhttp://mail.iacat.org/content/t%C3%A9cnicas-para-detectar-patrones-de-respuesta-at%C3%ADpicos-aberrant-patterns-detection-methods03124nas a2200385 4500008004100000020002200041245012900063210006900192250001500261260000800276300001000284490000700294520188400301653002502185653002702210653001502237653001002252653002102262653002802283653003802311653001102349653001102360653001102371653000902382653004602391653002702437653003002464653003202494100001502526700001602541700001602557700001502573700002502588856012502613 2005 eng d a0003-9993 (Print)00aAssessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing mobility in children using a computer adaptive testing a2005/05/17 cMay a932-90 v863 aOBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time.10a*Computer Simulation10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)/*methods10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity1 aHaley, S M1 aRaczek, A E1 aCoster, W J1 aDumas, H M1 aFragala-Pinkham, M A uhttp://mail.iacat.org/content/assessing-mobility-children-using-computer-adaptive-testing-version-pediatric-evaluation-001374nas a2200193 4500008004100000245005000041210004700091300001000138490000600148520081200154653001700966653002100983653002601004653002501030100001501055700001401070700002301084856007301107 2005 eng d00aAn Authoring Environment for Adaptive Testing0 aAuthoring Environment for Adaptive Testing a66-760 v83 aSIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied.
10aAdaptability10aAdaptive Testing10aAuthoring environment10aItem Response Theory1 aGuzmán, E1 aConejo, R1 aGarcía-Hervás, E uhttp://mail.iacat.org/content/authoring-environment-adaptive-testing02153nas a2200229 4500008004100000020002200041245008700063210006900150260004100219300001200260490000700272520135500279653002101634653001501655653002401670653002601694653002501720653002101745653003101766100002301797856010301820 2005 eng d a0022-0655 (Print)00aA comparison of item-selection methods for adaptive tests with content constraints0 acomparison of itemselection methods for adaptive tests with cont bBlackwell Publishing: United Kingdom a283-3020 v423 aIn test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test—a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM.10aAdaptive Testing10aAlgorithms10acontent constraints10aitem selection method10ashadow test approach10aspiraling method10aweighted deviations method1 avan der Linden, WJ uhttp://mail.iacat.org/content/comparison-item-selection-methods-adaptive-tests-content-constraints02792nas a2200469 4500008004100000020002200041245010400063210006900167250001500236260000800251300001200259490000700271520132800278653002201606653003101628653001501659653001601674653001001690653003401700653002101734653002401755653002501779653001501804653001101819653005301830653002901883653001101912653001101923653002001934653000901954653003101963653004601994653003102040653001402071653003202085100001502117700001002132700002502142700001702167700001302184856012502197 2005 eng d a0012-1622 (Print)00aA computer adaptive testing approach for assessing physical functioning in children and adolescents0 acomputer adaptive testing approach for assessing physical functi a2005/02/15 cFeb a113-1200 v473 aThe purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning.10a*Computer Systems10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild Development/*physiology10aChild, Preschool10aComputer Simulation10aConfidence Intervals10aDemography10aFemale10aGlycogen Storage Disease Type II/physiopathology10aHealth Status Indicators10aHumans10aInfant10aInfant, Newborn10aMale10aMotor Activity/*physiology10aOutcome Assessment (Health Care)/*methods10aReproducibility of Results10aSelf Care10aSensitivity and Specificity1 aHaley, S M1 aNi, P1 aFragala-Pinkham, M A1 aSkrinar, A M1 aCorzo, D uhttp://mail.iacat.org/content/computer-adaptive-testing-approach-assessing-physical-functioning-children-and-adolescents02124nas a2200253 4500008004100000245007900041210006900120300001200189490000700201520113300208653002701341653004601368653005201414653002901466653001101495653005601506653002501562653004101587653004501628653006201673100001501735700001501750856010501765 2005 eng d00aContemporary measurement techniques for rehabilitation outcomes assessment0 aContemporary measurement techniques for rehabilitation outcomes a339-3450 v373 aIn this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement.10a*Disability Evaluation10aActivities of Daily Living/classification10aDisabled Persons/classification/*rehabilitation10aHealth Status Indicators10aHumans10aOutcome Assessment (Health Care)/*methods/standards10aRecovery of Function10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aSensitivity and Specificity computerized adaptive testing1 aJette, A M1 aHaley, S M uhttp://mail.iacat.org/content/contemporary-measurement-techniques-rehabilitation-outcomes-assessment01560nas a2200169 4500008004100000245008000041210006900121300001200190490000700202520094200209653002101151653003001172653005401202100001401256700001301270856010701283 2005 eng d00aControlling item exposure and test overlap in computerized adaptive testing0 aControlling item exposure and test overlap in computerized adapt a204-2170 v293 aThis article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Content (Test) computerized adaptive testing1 aChen, S-Y1 aLei, P-W uhttp://mail.iacat.org/content/controlling-item-exposure-and-test-overlap-computerized-adaptive-testing02215nas a2200373 4500008004100000245011500041210006900156300001100225490000700236520104400243653002101287653002001308653001001328653000901338653002801347653001101375653003801386653000901424653001601433653004101449653001801490653003701508653003001545100001401575700001301589700001301602700001501615700001701630700001501647700001901662700001601681700001801697856012601715 2005 eng d00aData pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer0 aData pooling and analysis to build a preliminary item bank an ex a142-590 v283 aAssessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress.10a*Quality of Life10a*Questionnaires10aAdult10aAged10aData Collection/methods10aHumans10aIntestine, Large/*physiopathology10aMale10aMiddle Aged10aProstatic Neoplasms/*physiopathology10aPsychometrics10aResearch Support, Non-U.S. Gov't10aStatistics, Nonparametric1 aEton, D T1 aLai, J S1 aCella, D1 aReeve, B B1 aTalcott, J A1 aClark, J A1 aMcPherson, C P1 aLitwin, M S1 aMoinpour, C M uhttp://mail.iacat.org/content/data-pooling-and-analysis-build-preliminary-item-bank-example-using-bowel-function-prostate02242nas a2200217 4500008004100000020002200041245014200063210006900205260004100274300001200315490000700327520142600334653001401760653003401774653002301808653001601831653002701847100001201874700001701886856012101903 2005 eng d a0022-0655 (Print)00aIncreasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests0 aIncreasing the homogeneity of CATs itemexposure rates by minimiz bBlackwell Publishing: United Kingdom a245-2690 v423 aA computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CATs item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test (van der Linden & Reese, 1998) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aalgorithm10acomputerized adaptive testing10aitem exposure rate10ashadow test10avaried target function1 aLi, Y H1 aSchafer, W D uhttp://mail.iacat.org/content/increasing-homogeneity-cats-item-exposure-rates-minimizing-or-maximizing-varied-target02021nas a2200193 4500008004100000245009300041210006900134300001200203490000700215520136400222653001501586653002401601653001101625653002201636100001801658700001801676700001901694856011401713 2005 eng d00aInfeasibility in automated test assembly models: A comparison study of different methods0 aInfeasibility in automated test assembly models A comparison stu a223-2430 v423 aSeveral techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting of restrictions and an objective to be maximized or minimized. A problem arises when it is impossible to construct a test from the item pool that meets all specifications, that is, when the model is not feasible. Several methods exist to handle these infeasibility problems. In this article, test assembly models resulting from two practical testing programs were reconstructed to be infeasible. These models were analyzed using methods that forced a solution (Goal Programming, Multiple-Goal Programming, Greedy Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm (RODA), Integer Randomized Deletion Algorithm (IRDA), Set Covering (SC), and Item Sampling), or that analyzed the causes and used this information to force a solution (Irreducible Infeasible Set-Solver). Specialized methods such as the IRDA and the Irreducible Infeasible Set-Solver performed best. Recommendations about the use of different methods are given. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAlgorithms10aItem Content (Test)10aModels10aTest Construction1 aHuitzing, H A1 aVeldkamp, B P1 aVerschoor, A J uhttp://mail.iacat.org/content/infeasibility-automated-test-assembly-models-comparison-study-different-methods02444nas a2200373 4500008004100000020004100041245008200082210006900164250001500233260000800248300001000256490000700266520136700273653001001640653000901650653002201659653003301681653003301714653001101747653001101758653000901769653001601778653004001794653001801834653001901852100001301871700001301884700001401897700001201911700001701923700001601940700001501956856009901971 2005 eng d a0895-4356 (Print)0895-4356 (Linking)00aAn item bank was created to improve the measurement of cancer-related fatigue0 aitem bank was created to improve the measurement of cancerrelate a2005/02/01 cFeb a190-70 v583 aOBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform.10aAdult10aAged10aAged, 80 and over10aFactor Analysis, Statistical10aFatigue/*etiology/psychology10aFemale10aHumans10aMale10aMiddle Aged10aNeoplasms/*complications/psychology10aPsychometrics10aQuestionnaires1 aLai, J-S1 aCella, D1 aDineen, K1 aBode, R1 aVon Roenn, J1 aGershon, RC1 aShevrin, D uhttp://mail.iacat.org/content/item-bank-was-created-improve-measurement-cancer-related-fatigue02903nas a2200409 4500008004100000245012300041210006900164260000800233300001000241490000700251520159700258653004701855653001001902653000901912653001901921653003101940653002601971653001101997653002902008653001102037653000902048653001602057653003902073653001402112653002502126653002702151653003002178653003202208653002802240653002202268100001502290700001602305700001702321700001602338700001502354856012402369 2005 eng d00aMeasuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach0 aMeasuring physical function in patients with complex medical and cOct a741-80 v843 aOBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions.10aActivities of Daily Living/*classification10aAdult10aAged10aCohort Studies10aContinuity of Patient Care10aDisability Evaluation10aFemale10aHealth Services Research10aHumans10aMale10aMiddle Aged10aPostoperative Care/*rehabilitation10aPrognosis10aRecovery of Function10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity10aSickness Impact Profile10aTreatment Outcome1 aSiebens, H1 aAndres, P L1 aPengsheng, N1 aCoster, W J1 aHaley, S M uhttp://mail.iacat.org/content/measuring-physical-function-patients-complex-medical-and-postsurgical-conditions-computer02721nas a2200373 4500008004100000245017500041210006900216300001100285490000700296520137800303653003001681653003101711653001501742653001001757653000901767653002201776653003201798653004201830653001101872653003001883653001101913653005101924653003101975653003702006653000902043653001602052653004102068653004102109653002602150100001402176700001802190700001902208856012002227 2005 eng d00aSimulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments0 aSimulated computerized adaptive tests for measuring functional s a629-380 v583 aBACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity.10a*Health Status Indicators10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aAnkle Joint/physiopathology10aDiagnosis, Computer-Assisted/*methods10aFemale10aHip Joint/physiopathology10aHumans10aJoint Diseases/physiopathology/*rehabilitation10aKnee Joint/physiopathology10aLower Extremity/*physiopathology10aMale10aMiddle Aged10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, P.H.S.10aRetrospective Studies1 aHart, D L1 aMioduski, J E1 aStratford, P W uhttp://mail.iacat.org/content/simulated-computerized-adaptive-tests-measuring-functional-status-were-efficient-good01477nas a2200193 4500008004100000245020200041210006900243300001200312490000700324520066400331653002100995653003001016653005501046653001101101653001801112100001401130700001701144856012201161 2005 eng d00aSomministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model]0 aSomministrazione di test computerizzati di tipo adattivo Un appl a131-1490 v123 aThe aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Response Theory computerized adaptive testing10aModels10aPsychometrics1 aMiceli, R1 aMolinengo, G uhttp://mail.iacat.org/content/somministrazione-di-test-computerizzati-di-tipo-adattivo-un-applicazione-del-modello-di03708nas a2200481 4500008004100000245005200041210005200093300001200145490000700157520221100164653001902375653002902394653005802423653001002481653005302491653000902544653001102553653002502564653002602589653003302615653001102648653001002659653000902669653001602678653002402694653007402718653001802792653002902810653005802839653003102897653003202928653003602960653003202996100001503028700001603043700001603059700001603075700001003091700001403101700001803115700001503133856007803148 2004 eng d00aActivity outcome measurement for postacute care0 aActivity outcome measurement for postacute care aI49-1610 v423 aBACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAftercare/*standards/statistics & numerical data10aAged10aBoston10aCognition/physiology10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHuman10aMale10aMiddle Aged10aMovement/physiology10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aPsychometrics10aQuestionnaires/standards10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aSupport, U.S. Gov't, P.H.S.1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aLudlow, L H1 aNi, P1 aBond, T L1 aSinclair, S J1 aJette, A M uhttp://mail.iacat.org/content/activity-outcome-measurement-postacute-care01885nas a2200241 4500008004100000245006000041210005900101260004800160300001200208520112200220653001501342653003401357653002201391653002101413653001901434653001601453653001701469653001301486653002801499100001301527700001601540856008701556 2004 eng d00aAdaptive computerized educational systems: A case study0 aAdaptive computerized educational systems A case study aSan Diego, CA. USAbElsevier Academic Press a143-1693 a(Created by APA) Adaptive instruction describes adjustments typical of one-on-one tutoring as discussed in the college tutorial scenario. So computerized adaptive instruction refers to the use of computer software--almost always incorporating artificially intelligent services--which has been designed to adjust both the presentation of information and the form of questioning to meet the current needs of an individual learner. This chapter describes a system for Internet-delivered adaptive instruction. The author attempts to demonstrate a sharp difference between the teaching that takes place outside of the classroom in universities and the kind that is at least afforded, if not taken advantage of by many, students in a more personalized educational setting such as those in the small liberal arts colleges. The author describes a computer-based technology that allows that gap to be bridged with the advantage of at least having more highly prepared learners sitting in college classrooms. A limited range of emerging research that supports that proposition is cited. (PsycINFO Database Record (c) 2005 APA )10aArtificial10aComputer Assisted Instruction10aComputer Software10aHigher Education10aIndividualized10aInstruction10aIntelligence10aInternet10aUndergraduate Education1 aRay, R D1 aMalott, R W uhttp://mail.iacat.org/content/adaptive-computerized-educational-systems-case-study02777nas a2200421 4500008004100000020004600041245011200087210006900199250001500268260001000283300000700293490000600300520144100306653002701747653003001774653004701804653001001851653000901861653002201870653002801892653001101920653001101931653002001942653000901962653001601971653001601987653001902003653001602022653003502038653002902073653004002102653003002142100001402172700001702186700001702203700001402220856012102234 2004 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aThe AMC Linear Disability Score project in a population requiring residential care: psychometric properties0 aAMC Linear Disability Score project in a population requiring re a2004/08/05 cAug 3 a420 v23 aBACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status.10a*Disability Evaluation10a*Health Status Indicators10aActivities of Daily Living/*classification10aAdult10aAged10aAged, 80 and over10aData Collection/methods10aFemale10aHumans10aLogistic Models10aMale10aMiddle Aged10aNetherlands10aPilot Projects10aProbability10aPsychometrics/*instrumentation10aQuestionnaires/standards10aResidential Facilities/*utilization10aSeverity of Illness Index1 aHolman, R1 aLindeboom, R1 aVermeulen, M1 aHaan, R J uhttp://mail.iacat.org/content/amc-linear-disability-score-project-population-requiring-residential-care-psychometric01669nas a2200229 4500008004100000245005500041210005300096300000800149490000700157520102900164653002101193653001201214653003001226653001801256653000901274100001701283700001201300700001501312700001601327700001401343856008201357 2004 eng d00aAssisted self-adapted testing: A comparative study0 aAssisted selfadapted testing A comparative study a2-90 v203 aA new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aAnxiety10aComputer Assisted Testing10aPsychometrics10aTest1 aHontangas, P1 aOlea, J1 aPonsoda, V1 aRevuelta, J1 aWise, S L uhttp://mail.iacat.org/content/assisted-self-adapted-testing-comparative-study01802nas a2200361 4500008004100000020002200041245009500063210006900158250001500227260001100242300001000253490000700263520066300270653002500933653002900958653001000987653000900997653002201006653004501028653003701073653001101110653001101121653000901132653001601141653003601157653003001193653003401223100001601257700002401273700001001297700001501307856011801322 2004 eng d a1074-9357 (Print)00aComputer adaptive testing: a strategy for monitoring stroke rehabilitation across settings0 aComputer adaptive testing a strategy for monitoring stroke rehab a2004/05/01 cSpring a33-390 v113 aCurrent functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings.10a*Computer Simulation10a*User-Computer Interface10aAdult10aAged10aAged, 80 and over10aCerebrovascular Accident/*rehabilitation10aDisabled Persons/*classification10aFemale10aHumans10aMale10aMiddle Aged10aMonitoring, Physiologic/methods10aSeverity of Illness Index10aTask Performance and Analysis1 aAndres, P L1 aBlack-Schaffer, R M1 aNi, P1 aHaley, S M uhttp://mail.iacat.org/content/computer-adaptive-testing-strategy-monitoring-stroke-rehabilitation-across-settings01905nas a2200217 4500008004100000245010000041210006900141260000800210300001100218490000700229520117300236653002201409653001501431653003701446653003101483653001101514653001901525100001201544700001501556856011601571 2004 eng d00aA computerized adaptive knowledge test as an assessment tool in general practice: a pilot study0 acomputerized adaptive knowledge test as an assessment tool in ge cMar a178-830 v263 aAdvantageous to assessment in many fields, CAT (computerized adaptive testing) use in general practice has been scarce. In adapting CAT to general practice, the basic assumptions of item response theory and the case specificity must be taken into account. In this context, this study first evaluated the feasibility of converting written extended matching tests into CAT. Second, it questioned the content validity of CAT. A stratified sample of students was invited to participate in the pilot study. The items used in this test, together with their parameters, originated from the written test. The detailed test paths of the students were retained and analysed thoroughly. Using the predefined pass-fail standard, one student failed the test. There was a positive correlation between the number of items and the candidate's ability level. The majority of students were presented with questions in seven of the 10 existing domains. Although proved to be a feasible test format, CAT cannot substitute for the existing high-stakes large-scale written test. It may provide a reliable instrument for identifying candidates who are at risk of failing in the written test.10a*Computer Systems10aAlgorithms10aEducational Measurement/*methods10aFamily Practice/*education10aHumans10aPilot Projects1 aRoex, A1 aDegryse, J uhttp://mail.iacat.org/content/computerized-adaptive-knowledge-test-assessment-tool-general-practice-pilot-study02594nas a2200469 4500008004100000245007200041210006900113300001000182490000600192520108600198653002501284653001001309653001501319653002101334653002201355653005901377653007001436653003301506653001101539653001101550653001301561653000901574653002701583653002201610653005501632653001901687653001501706653006601721653001801787653003701805653004101842653003001883653001301913100001501926700001301941700001801954700001501972700001401987700001402001700001302015856009602028 2004 eng d00aComputerized adaptive measurement of depression: A simulation study0 aComputerized adaptive measurement of depression A simulation stu a13-230 v43 aBackground: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity.10a*Computer Simulation10aAdult10aAlgorithms10aArea Under Curve10aComparative Study10aDepressive Disorder/*diagnosis/epidemiology/psychology10aDiagnosis, Computer-Assisted/*methods/statistics & numerical data10aFactor Analysis, Statistical10aFemale10aHumans10aInternet10aMale10aMass Screening/methods10aPatient Selection10aPersonality Inventory/*statistics & numerical data10aPilot Projects10aPrevalence10aPsychiatric Status Rating Scales/*statistics & numerical data10aPsychometrics10aResearch Support, Non-U.S. Gov't10aResearch Support, U.S. Gov't, P.H.S.10aSeverity of Illness Index10aSoftware1 aGardner, W1 aShear, K1 aKelleher, K J1 aPajer, K A1 aMammen, O1 aBuysse, D1 aFrank, E uhttp://mail.iacat.org/content/computerized-adaptive-measurement-depression-simulation-study02473nas a2200193 4500008004100000245010700041210007100148300001200219490000700231520172200238653002101960653003401981653001602015653003002031653002302061653004502084100001502129856013502144 2004 eng d00aÉvaluation et multimédia dans l'apprentissage d'une L2 [Assessment and multimedia in learning an L2]0 aÉvaluation et multimédia dans lapprentissage dune L2 Assessment a475-4870 v163 aIn the first part of this paper different areas where technology may be used for second language assessment are described. First, item banking operations, which are generally based on item Response Theory but not necessarily restricted to dichotomously scored items, facilitate assessment task organization and require technological support. Second, technology may help to design more authentic assessment tasks or may be needed in some direct testing situations. Third, the assessment environment may be more adapted and more stimulating when technology is used to give the student more control. The second part of the paper presents different functions of assessment. The monitoring function (often called formative assessment) aims at adapting the classroom activities to students and to provide continuous feedback. Technology may be used to train the teachers in monitoring techniques, to organize data or to produce diagnostic information; electronic portfolios or quizzes that are built in some educational software may also be used for monitoring. The placement function is probably the one in which the application of computer adaptive testing procedures (e.g. French CAPT) is the most appropriate. Automatic scoring devices may also be used for placement purposes. Finally the certification function requires more valid and more reliable tools. Technology may be used to enhance the testing situation (to make it more authentic) or to facilitate data processing during the construction of a test. Almond et al. (2002) propose a four component model (Selection, Presentation, Scoring and Response) for designing assessment systems. Each component must be planned taking into account the assessment function. 10aAdaptive Testing10aComputer Assisted Instruction10aEducational10aForeign Language Learning10aProgram Evaluation10aTechnology computerized adaptive testing1 aLaurier, M uhttp://mail.iacat.org/content/%C3%A9valuation-et-multim%C3%A9dia-dans-lapprentissage-dune-l2-assessment-and-multimedia-learning-l201774nas a2200193 4500008004100000245023000041210006900271300000900340490000700349520093900356653002101295653003001316653001801346653001601364653004201380100001201422700001901434856012701453 2004 eng d00aKann die Konfundierung von Konzentrationsleistung und Aktivierung durch adaptives Testen mit dern FAKT vermieden werden? [Avoiding the confounding of concentration performance and activation by adaptive testing with the FACT]0 aKann die Konfundierung von Konzentrationsleistung und Aktivierun a1-170 v253 aThe study investigates the effect of computerized adaptive testing strategies on the confounding of concentration performance with activation. A sample of 54 participants was administered 1 out of 3 versions (2 adaptive, 1 non-adaptive) of the computerized Frankfurt Adaptive Concentration Test FACT (Moosbrugger & Heyden, 1997) at three subsequent points in time. During the test administration changes in activation (electrodermal activity) were recorded. The results pinpoint a confounding of concentration performance with activation for the non-adaptive test version, but not for the adaptive test versions (p = .01). Thus, adaptive FACT testing strategies can remove the confounding of concentration performance with activation, thereby increasing the discriminant validity. In conclusion, an attention-focusing-hypothesis is formulated to explain the observed effect. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aConcentration10aPerformance10aTesting computerized adaptive testing1 aFrey, A1 aMoosbrugger, H uhttp://mail.iacat.org/content/kann-die-konfundierung-von-konzentrationsleistung-und-aktivierung-durch-adaptives-testen-mit02844nas a2200349 4500008004100000020004600041245011400087210006900201250001500270260001100285300000700296490000600303520169400309653002702003653002002030653002102050653002002071653004702091653003702138653001802175653001102193653001902204653001602223653002002239653003002259100001402289700001402303700001702317700002002334700001402354856012602368 2004 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aPractical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project0 aPractical methods for dealing with not applicable item responses a2004/06/18 cJun 16 a290 v23 aBACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used.10a*Disability Evaluation10a*Health Surveys10a*Logistic Models10a*Questionnaires10aActivities of Daily Living/*classification10aData Interpretation, Statistical10aHealth Status10aHumans10aPilot Projects10aProbability10aQuality of Life10aSeverity of Illness Index1 aHolman, R1 aGlas, C A1 aLindeboom, R1 aZwinderman, A H1 aHaan, R J uhttp://mail.iacat.org/content/practical-methods-dealing-not-applicable-item-responses-amc-linear-disability-score-project04033nas a2200433 4500008004100000245012300041210006900164260000800233300001200241490000700253520252400260653001902784653002902803653005802832653001002890653000902900653002202909653002602931653003302957653001102990653001103001653000903012653001603021653007403037653003003111653003603141653005803177653003103235653004503266653004103311653003203352100001603384700001503400700001603415700001603431700001403447700001203461856012603473 2004 eng d00aRefining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain0 aRefining the conceptual basis for rehabilitation outcome measure cJan aI62-1720 v423 aBACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAged10aAged, 80 and over10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aQuestionnaires/*standards10aRecovery of Function/physiology10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aResearch Support, U.S. Gov't, Non-P.H.S.10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCoster, W J1 aHaley, S M1 aAndres, P L1 aLudlow, L H1 aBond, T L1 aNi, P S uhttp://mail.iacat.org/content/refining-conceptual-basis-rehabilitation-outcome-measurement-personal-care-and-instrumental01924nas a2200241 4500008004100000245009400041210006900135300001200204490000700216520110100223653002101324653001301345653003001358653005701388653000901445653003201454653002601486653002001512100001401532700001301546700001501559856010801574 2003 eng d00aA Bayesian method for the detection of item preknowledge in computerized adaptive testing0 aBayesian method for the detection of item preknowledge in comput a121-1370 v273 aWith the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aCheating10aComputer Assisted Testing10aIndividual Differences computerized adaptive testing10aItem10aItem Analysis (Statistical)10aMathematical Modeling10aResponse Theory1 aMcLeod, L1 aLewis, C1 aThissen, D uhttp://mail.iacat.org/content/bayesian-method-detection-item-preknowledge-computerized-adaptive-testing02655nas a2200385 4500008004100000245014400041210006900185300001200254490000700266520138100273653002101654653003301675653002901708653001501737653001001752653000901762653002201771653002601793653003301819653002501852653001901877653001001896653002501906653001601931653002401947653002601971653002701997653003202024653001302056653002802069100001702097700001602114700001402130856012502144 2003 eng d00aCalibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT)0 aCalibration of an item pool for assessing the burden of headache a913-9330 v123 aBACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact.10a*Cost of Illness10a*Decision Support Techniques10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aDisability Evaluation10aFactor Analysis, Statistical10aHeadache/*psychology10aHealth Surveys10aHuman10aLongitudinal Studies10aMiddle Aged10aMigraine/psychology10aModels, Psychological10aPsychometrics/*methods10aQuality of Life/*psychology10aSoftware10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://mail.iacat.org/content/calibration-item-pool-assessing-burden-headaches-application-item-response-theory-headache01841nas a2200205 4500008004100000245009000041210006900131300001100200490000700211520111400218653002101332653003001353653001601383653003201399653001601431653004501447100001501492700001601507856011201523 2003 eng d00aA comparative study of item exposure control methods in computerized adaptive testing0 acomparative study of item exposure control methods in computeriz a71-1030 v403 aThis study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aEducational10aItem Analysis (Statistical)10aMeasurement10aStrategies computerized adaptive testing1 aChang, S-W1 aAnsley, T N uhttp://mail.iacat.org/content/comparative-study-item-exposure-control-methods-computerized-adaptive-testing01762nas a2200313 4500008004100000245007700041210006900118300001200187490000700199520083400206653002101040653001501061653001701076653001601093653003001109653001701139653001501156653002501171653002001196653001501216653002501231653001801256653000901274100001801283700001201301700001601313700001601329856010301345 2003 eng d00aComputerized adaptive rating scales for measuring managerial performance0 aComputerized adaptive rating scales for measuring managerial per a237-2460 v113 aComputerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aAlgorithms10aAssociations10aCitizenship10aComputer Assisted Testing10aConstruction10aContextual10aItem Response Theory10aJob Performance10aManagement10aManagement Personnel10aRating Scales10aTest1 aSchneider, RJ1 aGoff, M1 aAnderson, S1 aBorman, W C uhttp://mail.iacat.org/content/computerized-adaptive-rating-scales-measuring-managerial-performance01932nas a2200229 4500008004100000245007200041210006900113300001200182490000700194520116000201653001801361653002101379653003001400653001801430653002501448653002501473653005701498653002201555100001501577700001201592856009801604 2003 eng d00aComputerized adaptive testing using the nearest-neighbors criterion0 aComputerized adaptive testing using the nearestneighbors criteri a204-2160 v273 aItem selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aItem Response Theory10aStatistical Analysis10aStatistical Estimation computerized adaptive testing10aStatistical Tests1 aCheng, P E1 aLiou, M uhttp://mail.iacat.org/content/computerized-adaptive-testing-using-nearest-neighbors-criterion01805nas a2200277 4500008004100000245007600041210006900117300001100186490000600197520090300203653001501106653002901121653003001150653002001180653001101200653005001211653001801261653003201279653004101311653001801352100001401370700001301384700001301397700001901410856009801429 2003 eng d00aDeveloping an initial physical function item bank from existing sources0 aDeveloping an initial physical function item bank from existing a124-360 v43 aThe objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed.10a*Databases10a*Sickness Impact Profile10aAdaptation, Psychological10aData Collection10aHumans10aNeoplasms/*physiopathology/psychology/therapy10aPsychometrics10aQuality of Life/*psychology10aResearch Support, U.S. Gov't, P.H.S.10aUnited States1 aBode, R K1 aCella, D1 aLai, J S1 aHeinemann, A W uhttp://mail.iacat.org/content/developing-initial-physical-function-item-bank-existing-sources01817nas a2200253 4500008004100000245013100041210006900172300001000241490000600251520095700257653001501214653002901229653002501258653001501283653002001298653001101318653003101329100001401360700001601374700001401390700001401404700002101418856012401439 2003 eng d00aAn examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model0 aexamination of exposure control and content balancing restrictio a24-420 v43 aThe purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools.10a*Computers10a*Educational Measurement10a*Models, Theoretical10aAutomation10aDecision Making10aHumans10aReproducibility of Results1 aDavis, LL1 aPastor, D A1 aDodd, B G1 aChiang, C1 aFitzpatrick, S J uhttp://mail.iacat.org/content/examination-exposure-control-and-content-balancing-restrictions-item-selection-cats-using03167nas a2200361 4500008004100000245012500041210006900166300001200235490000700247520200400254653002902258653001502287653001002302653000902312653002202321653002002343653003302363653002402396653001102420653001002431653000902441653001602450653002502466653002602491653004302517653003202560653001902592653002802611100001702639700001602656700001402672856011902686 2003 eng d00aThe feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies0 afeasibility of applying item response theory to measures of migr a887-9020 v123 aBACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing.10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aCost of Illness10aFactor Analysis, Statistical10aFeasibility Studies10aFemale10aHuman10aMale10aMiddle Aged10aMigraine/*psychology10aModels, Psychological10aPsychometrics/instrumentation/*methods10aQuality of Life/*psychology10aQuestionnaires10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://mail.iacat.org/content/feasibility-applying-item-response-theory-measures-migraine-impact-re-analysis-three02749nas a2200349 4500008004100000245015800041210006900199260000800268300001200276490000700288520160300295653003001898653002001928653001001948653003201958653001101990653001102001653000902012653001602021653002802037653001802065653003702083653004102120653002802161100001302189700001502202700001302217700001502230700001402245700001902259856012102278 2003 eng d00aItem banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale0 aItem banking to improve shorten and computerized selfreported fa cAug a485-5010 v123 aFatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum.10a*Health Status Indicators10a*Questionnaires10aAdult10aFatigue/*diagnosis/etiology10aFemale10aHumans10aMale10aMiddle Aged10aNeoplasms/complications10aPsychometrics10aResearch Support, Non-U.S. Gov't10aResearch Support, U.S. Gov't, P.H.S.10aSickness Impact Profile1 aLai, J-S1 aCrane, P K1 aCella, D1 aChang, C-H1 aBode, R K1 aHeinemann, A W uhttp://mail.iacat.org/content/item-banking-improve-shorten-and-computerized-self-reported-fatigue-illustration-steps01748nas a2200217 4500008004100000245008700041210006900128300001200197490000700209520100200216653002101218653003001239653002601269653002501295653002001320653001401340653004901354100001401403700001401417856009901431 2003 eng d00aItem exposure constraints for testlets in the verbal reasoning section of the MCAT0 aItem exposure constraints for testlets in the verbal reasoning s a335-3560 v273 aThe current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aEntrance Examinations10aItem Response Theory10aRandom Sampling10aReasoning10aVerbal Ability computerized adaptive testing1 aDavis, LL1 aDodd, B G uhttp://mail.iacat.org/content/item-exposure-constraints-testlets-verbal-reasoning-section-mcat01437nas a2200205 4500008004100000245008800041210007000129300001200199490000700211520067700218653002100895653003000916653002400946653002500970653002600995653005201021100001901073700002301092856011601115 2003 eng d00aOptimal stratification of item pools in α-stratified computerized adaptive testing0 aOptimal stratification of item pools in αstratified computerized a262-2740 v273 aA method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Content (Test)10aItem Response Theory10aMathematical Modeling10aTest Construction computerized adaptive testing1 aChang, Hua-Hua1 avan der Linden, WJ uhttp://mail.iacat.org/content/optimal-stratification-item-pools-%CE%B1-stratified-computerized-adaptive-testing01802nas a2200241 4500008004100000245009300041210006900134300001200203490000700215520098200222653001801204653002101222653003001243653001901273653004601292653001801338653002501356653001501381100001401396700001901410700001501429856011601444 2003 eng d00aThe relationship between item exposure and test overlap in computerized adaptive testing0 arelationship between item exposure and test overlap in computeri a129-1450 v403 aThe purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA )10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aHuman Computer10aInteraction computerized adaptive testing10aItem Analysis10aItem Analysis (Test)10aTest Items1 aChen, S-Y1 aAnkenmann, R D1 aSpray, J A uhttp://mail.iacat.org/content/relationship-between-item-exposure-and-test-overlap-computerized-adaptive-testing01521nas a2200157 4500008004100000245009500041210006900136300001200205490000700217520090100224653002101125653003001146653004501176100002301221856011901244 2003 eng d00aSome alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing0 aSome alternatives to SympsonHetter itemexposure control in compu a249-2650 v283 aTheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. 10aAdaptive Testing10aComputer Assisted Testing10aTest Items computerized adaptive testing1 avan der Linden, WJ uhttp://mail.iacat.org/content/some-alternatives-sympson-hetter-item-exposure-control-computerized-adaptive-testing01602nas a2200205 4500008004100000245009400041210006900135260001000204300001200214490000800226520086400234653003001098653000901128653003401137653001101171653003501182653004501217100001801262856011601280 2003 eng d00aTen recommendations for advancing patient-centered outcomes measurement for older persons0 aTen recommendations for advancing patientcentered outcomes measu cSep 2 a403-4090 v1393 aThe past 50 years have seen great progress in the measurement of patient-based outcomes for older populations. Most of the measures now used were created under the umbrella of a set of assumptions and procedures known as classical test theory. A recent alternative for health status assessment is item response theory. Item response theory is superior to classical test theory because it can eliminate test dependency and achieve more precise measurement through computerized adaptive testing. Computerized adaptive testing reduces test administration times and allows varied and precise estimates of ability. Several key challenges must be met before computerized adaptive testing becomes a productive reality. I discuss these challenges for the health assessment of older persons in the form of 10 "Ds": things we need to deliberate, debate, decide, and do.10a*Health Status Indicators10aAged10aGeriatric Assessment/*methods10aHumans10aPatient-Centered Care/*methods10aResearch Support, U.S. Gov't, Non-P.H.S.1 aMcHorney, C A uhttp://mail.iacat.org/content/ten-recommendations-advancing-patient-centered-outcomes-measurement-older-persons01550nas a2200193 4500008004100000245026000041210006900301300001000370490000700380520067700387653002101064653002201085653001701107653001501124653004801139100002201187700002201209856012501231 2003 eng d00aTiming behavior in computerized adaptive testing: Response times for correct and incorrect answers are not related to general fluid intelligence/Zum Zeitverhalten beim computergestützten adaptiveb Testen: Antwortlatenzen bei richtigen und falschen Lösun0 aTiming behavior in computerized adaptive testing Response times a57-630 v243 aExamined the effects of general fluid intelligence on item response times for correct and false responses in computerized adaptive testing. After performing the CFT3 intelligence test, 80 individuals (aged 17-44 yrs) completed perceptual and cognitive discrimination tasks. Results show that response times were related neither to the proficiency dimension reflected by the task nor to the individual level of fluid intelligence. Furthermore, the false > correct-phenomenon as well as substantial positive correlations between item response times for false and correct responses were shown to be independent of intelligence levels. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aCognitive Ability10aIntelligence10aPerception10aReaction Time computerized adaptive testing1 aRammsayer, Thomas1 aBrandler, Susanne uhttp://mail.iacat.org/content/timing-behavior-computerized-adaptive-testing-response-times-correct-and-incorrect-answers01517nas a2200229 4500008004100000245008700041210006900128300001200197490000700209520075300216653002100969653001300990653003001003653003401033653001101067653001501078653001501093653001801108100002301126700002701149856011101176 2003 eng d00aUsing response times to detect aberrant responses in computerized adaptive testing0 aUsing response times to detect aberrant responses in computerize a251-2650 v683 aA lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.10aAdaptive Testing10aBehavior10aComputer Assisted Testing10acomputerized adaptive testing10aModels10aperson Fit10aPrediction10aReaction Time1 avan der Linden, WJ1 aKrimpen-Stoop, E M L A uhttp://mail.iacat.org/content/using-response-times-detect-aberrant-responses-computerized-adaptive-testing01983nas a2200289 4500008004100000245007600041210006900117260000800186300001200194490000700206520114900213653002401362653001301386653002101399653002001420653001501440653001001455653001001465653001101475653001101486653000901497653002401506653002601530100001601556700001501572856010601587 2002 eng d00aAssessing tobacco beliefs among youth using item response theory models0 aAssessing tobacco beliefs among youth using item response theory cNov aS21-S390 v683 aSuccessful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed.10a*Attitude to Health10a*Culture10a*Health Behavior10a*Questionnaires10aAdolescent10aAdult10aChild10aFemale10aHumans10aMale10aModels, Statistical10aSmoking/*epidemiology1 aPanter, A T1 aReeve, B B uhttp://mail.iacat.org/content/assessing-tobacco-beliefs-among-youth-using-item-response-theory-models02086nas a2200229 4500008004100000245012900041210006900170300001200239490000700251520124000258653001801498653002101516653004501537653003001582653001801612653002501630653002601655100001601681700001401697700001901711856012601730 2002 eng d00aA comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model0 acomparison of item selection techniques and exposure control mec a147-1630 v263 aThe use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Statistical)10aAdaptive Testing10aAlgorithms computerized adaptive testing10aComputer Assisted Testing10aItem Analysis10aItem Response Theory10aMathematical Modeling1 aPastor, D A1 aDodd, B G1 aChang, Hua-Hua uhttp://mail.iacat.org/content/comparison-item-selection-techniques-and-exposure-control-mechanisms-cats-using-generalized02895nas a2200349 4500008004100000245008300041210006900124260000800193300001100201490000700212520176600219653003001985653002802015653001502043653001002058653000902068653002202077653001102099653001902110653001102129653000902140653001602149653006202165653006102227653003302288653003602321653003102357653002602388100001402414700001602428856010102444 2002 eng d00aDevelopment of an index of physical functional health status in rehabilitation0 aDevelopment of an index of physical functional health status in cMay a655-650 v833 aOBJECTIVE: To describe (1) the development of an index of physical functional health status (FHS) and (2) its hierarchical structure, unidimensionality, reproducibility of item calibrations, and practical application. DESIGN: Rasch analysis of existing data sets. SETTING: A total of 715 acute, orthopedic outpatient centers and 62 long-term care facilities in 41 states participating with Focus On Therapeutic Outcomes, Inc. PATIENTS: A convenience sample of 92,343 patients (40% male; mean age +/- standard deviation [SD], 48+/-17y; range, 14-99y) seeking rehabilitation between 1993 and 1999. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Patients completed self-report health status surveys at admission and discharge. The Medical Outcomes Study 36-Item Short-Form Health Survey's physical functioning scale (PF-10) is the foundation of the physical FHS. The Oswestry Low Back Pain Disability Questionnaire, Neck Disability Index, Lysholm Knee Questionnaire, items pertinent to patients with upper-extremity impairments, and items pertinent to patients with more involved neuromusculoskeletal impairments were cocalibrated into the PF-10. RESULTS: The final FHS item bank contained 36 items (patient separation, 2.3; root mean square measurement error, 5.9; mean square +/- SD infit, 0.9+/-0.5; outfit, 0.9+/-0.9). Analyses supported empirical item hierarchy, unidimensionality, reproducibility of item calibrations, and content and construct validity of the FHS-36. CONCLUSIONS: Results support the reliability and validity of FHS-36 measures in the present sample. Analyses show the potential for a dynamic, computer-controlled, adaptive survey for FHS assessment applicable for group analysis and clinical decision making for individual patients.10a*Health Status Indicators10a*Rehabilitation Centers10aAdolescent10aAdult10aAged10aAged, 80 and over10aFemale10aHealth Surveys10aHumans10aMale10aMiddle Aged10aMusculoskeletal Diseases/*physiopathology/*rehabilitation10aNervous System Diseases/*physiopathology/*rehabilitation10aPhysical Fitness/*physiology10aRecovery of Function/physiology10aReproducibility of Results10aRetrospective Studies1 aHart, D L1 aWright, B D uhttp://mail.iacat.org/content/development-index-physical-functional-health-status-rehabilitation01639nas a2200217 4500008004100000245009700041210006900138300001200207490000700219520087500226653002101101653003001122653002501152653002301177653002501200653002801225653002701253100001301280700001501293856011301308 2002 eng d00aAn EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model0 aEM approach to parameter estimation for the Zinnes and Griggs pa a208-2270 v263 aBorman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. 10aAdaptive Testing10aComputer Assisted Testing10aItem Response Theory10aMaximum Likelihood10aPersonnel Evaluation10aStatistical Correlation10aStatistical Estimation1 aStark, S1 aDrasgow, F uhttp://mail.iacat.org/content/em-approach-parameter-estimation-zinnes-and-griggs-paired-comparison-irt-model02014nas a2200205 4500008004100000245008200041210006900123300001200192490000700204520132300211653002101534653001501555653003001570653001101600653000901611653004701620100001901667700001301686856010901699 2002 eng d00aHypergeometric family and item overlap rates in computerized adaptive testing0 aHypergeometric family and item overlap rates in computerized ada a387-3980 v673 aA computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aAlgorithms10aComputer Assisted Testing10aTaking10aTest10aTime On Task computerized adaptive testing1 aChang, Hua-Hua1 aZhang, J uhttp://mail.iacat.org/content/hypergeometric-family-and-item-overlap-rates-computerized-adaptive-testing01694nas a2200277 4500008004100000020001000041245006500051210006400116260009700180300001100277520074500288653002101033653002201054653002501076653002801101653002501129653001601154653001801170653005501188653001501243653001201258100001801270700002301288700001301311856009201324 2002 eng d a02-0900aMathematical-programming approaches to test item pool design0 aMathematicalprogramming approaches to test item pool design aTwente, The NetherlandsbUniversity of Twente, Faculty of Educational Science and Technology a93-1083 a(From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT.10aAdaptive Testing10aComputer Assisted10aComputer Programming10aEducational Measurement10aItem Response Theory10aMathematics10aPsychometrics10aStatistical Rotation computerized adaptive testing10aTest Items10aTesting1 aVeldkamp, B P1 avan der Linden, WJ1 aAriel, A uhttp://mail.iacat.org/content/mathematical-programming-approaches-test-item-pool-design02416nas a2200277 4500008004100000245012200041210006900163260000800232300001000240490000700250520148700257653002101744653002101765653002001786653001001806653002201816653002901838653001101867653001801878653001901896653004101915653003201956100001301988700001802001856011902019 2002 eng d00aMeasuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system0 aMeasuring quality of life in chronic illness the functional asse cDec aS10-70 v833 aWe focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing.10a*Chronic Disease10a*Quality of Life10a*Rehabilitation10aAdult10aComparative Study10aHealth Status Indicators10aHumans10aPsychometrics10aQuestionnaires10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCella, D1 aNowinski, C J uhttp://mail.iacat.org/content/measuring-quality-life-chronic-illness-functional-assessment-chronic-illness-therapy03063nas a2200325 4500008004100000020004100041245008100082210006900163250001500232260000800247300001100255490000700266520201300273653001502286653001002301653004002311653005702351653003302408653001102441653001102452653001802463653000902481653002802490653001202518653005502530100001502585700001802600700001502618856010402633 2002 eng d a0025-7079 (Print)0025-7079 (Linking)00aMultidimensional adaptive testing for mental health problems in primary care0 aMultidimensional adaptive testing for mental health problems in a2002/09/10 cSep a812-230 v403 aOBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings.10aAdolescent10aChild10aChild Behavior Disorders/*diagnosis10aChild Health Services/*organization & administration10aFactor Analysis, Statistical10aFemale10aHumans10aLinear Models10aMale10aMass Screening/*methods10aParents10aPrimary Health Care/*organization & administration1 aGardner, W1 aKelleher, K J1 aPajer, K A uhttp://mail.iacat.org/content/multidimensional-adaptive-testing-mental-health-problems-primary-care01632nas a2200241 4500008004100000245005900041210005800100300001200158490000700170520087100177653002101048653003401069653002801103653002001131653003201151653002501183653001501208653002701223653002201250653001601272100001601288856008601304 2002 eng d00aOutlier detection in high-stakes certification testing0 aOutlier detection in highstakes certification testing a219-2330 v393 aDiscusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10acomputerized adaptive testing10aEducational Measurement10aGoodness of Fit10aItem Analysis (Statistical)10aItem Response Theory10aperson Fit10aStatistical Estimation10aStatistical Power10aTest Scores1 aMeijer, R R uhttp://mail.iacat.org/content/outlier-detection-high-stakes-certification-testing02035nas a2200253 4500008004100000245010900041210006900150300000900219490000600228520114100234653002101375653001501396653003901411653002201450653002501472653001801497653002201515653005501537653001501592653001201607100001701619700002501636856012001661 2002 eng d00aA structure-based approach to psychological measurement: Matching measurement models to latent structure0 astructurebased approach to psychological measurement Matching me a4-160 v93 aThe present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aAssessment10aClassification (Cognitive Process)10aComputer Assisted10aItem Response Theory10aPsychological10aScaling (Testing)10aStatistical Analysis computerized adaptive testing10aTaxonomies10aTesting1 aRuscio, John1 aRuscio, Ayelet Meron uhttp://mail.iacat.org/content/structure-based-approach-psychological-measurement-matching-measurement-models-latent01150nas a2200205 4500008004100000245008200041210006900123260005600192520043100248653002100679653003000700653001600730653001600746653001800762100001500780700001700795700001700812700001400829856010100843 2002 eng d00aThe work ahead: A psychometric infrastructure for computerized adaptive tests0 awork ahead A psychometric infrastructure for computerized adapti aMahwah, N.J. USAbLawrence Erlbaum Associates, Inc.3 a(From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aEducational10aMeasurement10aPsychometrics1 aDrasgow, F1 aPotenza, M P1 aFreemer, J J1 aWard, W C uhttp://mail.iacat.org/content/work-ahead-psychometric-infrastructure-computerized-adaptive-tests01700nas a2200229 4500008004100000245007800041210006900119300001200188490000700200520094500207653002501152653005101177653003001228653001801258653001101276653002701287653001101314100001701325700001101342700001801353856009901371 2001 eng d00aComputerized adaptive testing with the generalized graded unfolding model0 aComputerized adaptive testing with the generalized graded unfold a177-1960 v253 aExamined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA )10aAttitude Measurement10aCollege Students computerized adaptive testing10aComputer Assisted Testing10aItem Response10aModels10aStatistical Estimation10aTheory1 aRoberts, J S1 aLin, Y1 aLaughlin, J E uhttp://mail.iacat.org/content/computerized-adaptive-testing-generalized-graded-unfolding-model01618nas a2200193 4500008004100000245008600041210006900127300001200196490000700208520094500215653002101160653003001181653004101211653000901252653001701261100001601278700001701294856011301311 2001 eng d00aDifferences between self-adapted and computerized adaptive tests: A meta-analysis0 aDifferences between selfadapted and computerized adaptive tests a235-2470 v383 aSelf-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aScores computerized adaptive testing10aTest10aTest Anxiety1 aPitkin, A K1 aVispoel, W P uhttp://mail.iacat.org/content/differences-between-self-adapted-and-computerized-adaptive-tests-meta-analysis01841nas a2200229 4500008004100000245007400041210006900115300001000184490000700194520110700201653002101308653000901329653004801338653001801386653002801404653002401432653001501456100001601471700001701487700001601504856009101520 2001 eng d00aEvaluation of an MMPI-A short form: Implications for adaptive testing0 aEvaluation of an MMPIA short form Implications for adaptive test a76-890 v763 aReports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aMean10aMinnesota Multiphasic Personality Inventory10aPsychometrics10aStatistical Correlation10aStatistical Samples10aTest Forms1 aArcher, R P1 aTirrell, C A1 aElkins, D E uhttp://mail.iacat.org/content/evaluation-mmpi-short-form-implications-adaptive-testing02102nas a2200337 4500008004100000245014400041210006900185300001200254490000700266520096600273653002501239653003601264653002501300653001001325653003001335653001101365653001001376653000901386653003101395653003201426653003601458653003401494653002001528100001601548700001401564700001601578700001901594700001301613700001501626856012301641 2001 eng d00aAn examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales0 aexamination of the comparative reliability validity and accuracy a965-9730 v863 aThis laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results.10a*Computer Simulation10a*Employee Performance Appraisal10a*Personnel Selection10aAdult10aAutomatic Data Processing10aFemale10aHuman10aMale10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aTask Performance and Analysis10aVideo Recording1 aBorman, W C1 aBuck, D E1 aHanson, M A1 aMotowidlo, S J1 aStark, S1 aDrasgow, F uhttp://mail.iacat.org/content/examination-comparative-reliability-validity-and-accuracy-performance-ratings-made-using01466nas a2200253 4500008004100000245013900041210006900180260005000249300001200299520053700311653002100848653002500869653001200894653002900906653002200935653002700957653002600984653001501010653001601025100001501041700001601056700001701072856012301089 2001 eng d00aItem response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores0 aItem response theory applied to combinations of multiplechoice a aMahwah, N.J. USAbLawrence Erlbaum Associates a289-3153 a(From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aItem Response Theory10aMethod)10aMultiple Choice (Testing10aScoring (Testing)10aStatistical Estimation10aStatistical Weighting10aTest Items10aTest Scores1 aThissen, D1 aNelson, L A1 aSwygert, K A uhttp://mail.iacat.org/content/item-response-theory-applied-combinations-multiple-choice-and-constructed-response-items01986nas a2200205 4500008004100000245010100041210006900142300001200211490000700223520124900230653001201479653002101491653003001512653001501542653001601557653004501573100001701618700001901635856012601654 2001 eng d00aItem selection in computerized adaptive testing: Should more discriminating items be used first?0 aItem selection in computerized adaptive testing Should more disc a249-2660 v383 aDuring computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA )10aability10aAdaptive Testing10aComputer Assisted Testing10aEstimation10aStatistical10aTest Items computerized adaptive testing1 aHau, Kit-Tai1 aChang, Hua-Hua uhttp://mail.iacat.org/content/item-selection-computerized-adaptive-testing-should-more-discriminating-items-be-used-first02032nas a2200253 4500008004100000245007700041210006900118260001200187300001200199490000700211520125800218653003901476653002901515653001501544653001001559653001101569653001101580653000901591653003001600653001301630100001601643700002001659856009901679 2001 eng d00aNCLEX-RN performance: predicting success on the computerized examination0 aNCLEXRN performance predicting success on the computerized exami cJul-Aug a158-1650 v173 aSince the adoption of the Computerized Adaptive Testing (CAT) format of the National Certification Licensure Examination for Registered Nurses (NCLEX-RN), no studies have been reported in the literature on predictors of successful performance by baccalaureate nursing graduates on the licensure examination. In this study, a discriminant analysis was used to identify which of 21 variables can be significant predictors of success on the CAT NCLEX-RN. The convenience sample consisted of 289 individuals who graduated from a baccalaureate nursing program between 1995 and 1998. Seven significant predictor variables were identified. The total number of C+ or lower grades earned in nursing theory courses was the best predictor, followed by grades in several individual nursing courses. More than 93 per cent of graduates were correctly classified. Ninety-four per cent of NCLEX "passes" were correctly classified, as were 92 per cent of NCLEX failures. This degree of accuracy in classifying CAT NCLEX-RN failures represents a marked improvement over results reported in previous studies of licensure examinations, and suggests the discriminant function will be helpful in identifying future students in danger of failure. J Prof Nurs 17:158-165, 2001.10a*Education, Nursing, Baccalaureate10a*Educational Measurement10a*Licensure10aAdult10aFemale10aHumans10aMale10aPredictive Value of Tests10aSoftware1 aBeeman, P B1 aWaterhouse, J K uhttp://mail.iacat.org/content/nclex-rn-performance-predicting-success-computerized-examination01860nas a2200193 4500008004100000245012400041210007100165300001200236490000700248520110700255653002101362653002601383653002201409653001401431653005901445100001601504700001701520856012901537 2001 eng d00aNouveaux développements dans le domaine du testing informatisé [New developments in the area of computerized testing]0 aNouveaux développements dans le domaine du testing informatisé N a221-2300 v463 aL'usage de l'évaluation assistée par ordinateur s'est fortement développé depuis la première formulation de ses principes de base dans les années soixante et soixante-dix. Cet article offre une introduction aux derniers développements dans le domaine de l'évaluation assistée par ordinateur, en particulier celui du testing adaptative informatisée (TAI). L'estimation de l'aptitude, la sélection des items et le développement d'une base d'items dans le cas du TAI sont discutés. De plus, des exemples d'utilisations innovantes de l'ordinateur dans des systèmes intégrés de testing et de testing via Internet sont présentés. L'article se termine par quelques illustrations de nouvelles applications du testing informatisé et des suggestions pour des recherches futures.Discusses the latest developments in computerized psychological assessment, with emphasis on computerized adaptive testing (CAT). Ability estimation, item selection, and item pool development in CAT are described. Examples of some innovative approaches to CAT are presented. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Applications10aComputer Assisted10aDiagnosis10aPsychological Assessment computerized adaptive testing1 aMeijer, R R1 aGrégoire, J uhttp://mail.iacat.org/content/nouveaux-d%C3%A9veloppements-dans-le-domaine-du-testing-informatis%C3%A9-new-developments-area01706nas a2200181 4500008004100000245007300041210006900114300001100183490000700194520110100201653002101302653003001323653002501353653001501378100001701393700001501410856009901425 2001 eng d00aOutlier measures and norming methods for computerized adaptive tests0 aOutlier measures and norming methods for computerized adaptive t a85-1040 v263 aNotes that the problem of identifying outliers has 2 important aspects: the choice of outlier measures and the method to assess the degree of outlyingness (norming) of those measures. Several classes of measures for identifying outliers in Computerized Adaptive Tests (CATs) are introduced. Some of these measures are constructed to take advantage of CATs' sequential choice of items; other measures are taken directly from paper and pencil (P&P) tests and are used for baseline comparisons. Assessing the degree of outlyingness of CAT responses, however, can not be applied directly from P&P tests because stopping rules associated with CATs yield examinee responses of varying lengths. Standard outlier measures are highly correlated with the varying lengths which makes comparison across examinees impossible. Therefore, 4 methods are presented and compared which map outlier statistics to a familiar probability scale (a p value). The methods are explored in the context of CAT data from a 1995 Nationally Administered Computerized Examination (NACE). (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aStatistical Analysis10aTest Norms1 aBradlow, E T1 aWeiss, R E uhttp://mail.iacat.org/content/outlier-measures-and-norming-methods-computerized-adaptive-tests01349nas a2200181 4500008004100000245007300041210006900114260005600183300001200239520069300251653002100944653003000965653002200995653002001017100001601037700001701053856009701070 2001 eng d00aPractical issues in setting standards on computerized adaptive tests0 aPractical issues in setting standards on computerized adaptive t aMahwah, N.J. USAbLawrence Erlbaum Associates, Inc. a355-3693 a(From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aPerformance Tests10aTesting Methods1 aSireci, S G1 aClauser, B E uhttp://mail.iacat.org/content/practical-issues-setting-standards-computerized-adaptive-tests01592nas a2200205 4500008004100000245016500041210006900206300001200275490000700287520076900294653002101063653002601084653003001110653002501140653005101165100001301216700001701229700002101246856011901267 2001 eng d00aToepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data]0 aToepassing van een computergestuurde adaptieve testprocedure op a119-1330 v563 aStudied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Applications10aComputer Assisted Testing10aPersonality Measures10aTest Reliability computerized adaptive testing1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://mail.iacat.org/content/toepassing-van-een-computergestuurde-adaptieve-testprocedure-op-persoonlijkheidsdata01588nas a2200241 4500008004100000245005800041210005800099300001200157490000600169520081000175653001400985653001400999653004801013653005701061653001101118653001801129653003101147653003701178100001301215700001701228700001601245856008501261 2000 eng d00aCAT administration of language placement examinations0 aCAT administration of language placement examinations a292-3020 v13 aThis article describes the development of a computerized adaptive test for Cegep de Jonquiere, a community college located in Quebec, Canada. Computerized language proficiency testing allows the simultaneous presentation of sound stimuli as the question is being presented to the test-taker. With a properly calibrated bank of items, the language proficiency test can be offered in an adaptive framework. By adapting the test to the test-taker's level of ability, an assessment can be made with significantly fewer items. We also describe our initial attempt to detect instances in which "cheating low" is occurring. In the "cheating low" situation, test-takers deliberately answer questions incorrectly, questions that they are fully capable of answering correctly had they been taking the test honestly.10a*Language10a*Software10aAptitude Tests/*statistics & numerical data10aEducational Measurement/*statistics & numerical data10aHumans10aPsychometrics10aReproducibility of Results10aResearch Support, Non-U.S. Gov't1 aStahl, J1 aBergstrom, B1 aGershon, RC uhttp://mail.iacat.org/content/cat-administration-language-placement-examinations01390nas a2200193 4500008004100000245009400041210006900135300001200204490000700216520067900223653002100902653003000923653002500953653005700978100001401035700001901049700001901068856010901087 2000 eng d00aA comparison of item selection rules at the early stages of computerized adaptive testing0 acomparison of item selection rules at the early stages of comput a241-2550 v243 aThe effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis (Test)10aStatistical Estimation computerized adaptive testing1 aChen, S-Y1 aAnkenmann, R D1 aChang, Hua-Hua uhttp://mail.iacat.org/content/comparison-item-selection-rules-early-stages-computerized-adaptive-testing01564nas a2200229 4500008004100000245006400041210006300105300001100168490000600179520083800185653002701023653001501050653001501065653004201080653001101122653002601133653002601159653003101185100001501216700001601231856008701247 2000 eng d00aComputerization and adaptive administration of the NEO PI-R0 aComputerization and adaptive administration of the NEO PIR a347-640 v73 aThis study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments.10a*Personality Inventory10aAlgorithms10aCalifornia10aDiagnosis, Computer-Assisted/*methods10aHumans10aModels, Psychological10aPsychometrics/methods10aReproducibility of Results1 aReise, S P1 aHenson, J M uhttp://mail.iacat.org/content/computerization-and-adaptive-administration-neo-pi-r03275nas a2200217 4500008004100000245020800041210007000249300001000319490000700329520241300336653002102749653002402770653003202794653001302826100001902839700001802858700001702876700001802893700001702911856012902928 2000 eng d00aDiagnostische programme in der Demenzfrüherkennung: Der Adaptive Figurenfolgen-Lerntest (ADAFI) [Diagnostic programs in the early detection of dementia: The Adaptive Figure Series Learning Test (ADAFI)]0 aDiagnostische programme in der Demenzfrüherkennung Der Adaptive a16-290 v133 aZusammenfassung: Untersucht wurde die Eignung des computergestützten Adaptiven Figurenfolgen-Lerntests (ADAFI), zwischen gesunden älteren Menschen und älteren Menschen mit erhöhtem Demenzrisiko zu differenzieren. Der im ADAFI vorgelegte Aufgabentyp der fluiden Intelligenzdimension (logisches Auffüllen von Figurenfolgen) hat sich in mehreren Studien zur Erfassung des intellektuellen Leistungspotentials (kognitive Plastizität) älterer Menschen als günstig für die genannte Differenzierung erwiesen. Aufgrund seiner Konzeption als Diagnostisches Programm fängt der ADAFI allerdings einige Kritikpunkte an Vorgehensweisen in diesen bisherigen Arbeiten auf. Es konnte gezeigt werden, a) daß mit dem ADAFI deutliche Lokationsunterschiede zwischen den beiden Gruppen darstellbar sind, b) daß mit diesem Verfahren eine gute Vorhersage des mentalen Gesundheitsstatus der Probanden auf Einzelfallebene gelingt (Sensitivität: 80 %, Spezifität: 90 %), und c) daß die Vorhersageleistung statusdiagnostischer Tests zur Informationsverarbeitungsgeschwindigkeit und zum Arbeitsgedächtnis geringer ist. Die Ergebnisse weisen darauf hin, daß die plastizitätsorientierte Leistungserfassung mit dem ADAFI vielversprechend für die Frühdiagnostik dementieller Prozesse sein könnte.The aim of this study was to examine the ability of the computerized Adaptive Figure Series Learning Test (ADAFI) to differentiate among old subjects at risk for dementia and old healthy controls. Several studies on the subject of measuring the intellectual potential (cognitive plasticity) of old subjects have shown the usefulness of the fluid intelligence type of task used in the ADAFI (completion of figure series) for this differentiation. Because the ADAFI has been developed as a Diagnostic Program it is able to counter some critical issues in those studies. It was shown a) that distinct differences between both groups are revealed by the ADAFI, b) that the prediction of the cognitive health status of individual subjects is quite good (sensitivity: 80 %, specifity: 90 %), and c) that the prediction of the cognitive health status with tests of processing speed and working memory is worse than with the ADAFI. The results indicate that the ADAFI might be a promising plasticity-oriented tool for the measurement of cognitive decline in the elderly, and thus might be useful for the early detection of dementia.10aAdaptive Testing10aAt Risk Populations10aComputer Assisted Diagnosis10aDementia1 aSchreiber, M D1 aSchneider, RJ1 aSchweizer, A1 aBeckmann, J F1 aBaltissen, R uhttp://mail.iacat.org/content/diagnostische-programme-der-demenzfr%C3%BCherkennung-der-adaptive-figurenfolgen-lerntest-adafi01433nas a2200193 4500008004100000245006300041210006300104300001200167490000700179520079500186653001800981653002100999653003001020653001801050653005701068100001501125700001201140856008701152 2000 eng d00aEstimation of trait level in computerized adaptive testing0 aEstimation of trait level in computerized adaptive testing a257-2650 v243 aNotes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA )10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aStatistical Estimation computerized adaptive testing1 aCheng, P E1 aLiou, M uhttp://mail.iacat.org/content/estimation-trait-level-computerized-adaptive-testing02316nas a2200205 4500008004100000245012100041210006900162300000800231490000700239520159200246653002101838653003001859653002201889653001801911653001601929653000901945653001801954100001401972856012401986 2000 eng d00aAn examination of the reliability and validity of performance ratings made using computerized adaptive rating scales0 aexamination of the reliability and validity of performance ratin a5700 v613 aThis study compared the psychometric properties of performance ratings made using recently-developed computerized adaptive rating scales (CARS) to the psyc hometric properties of ratings made using more traditional paper-and-pencil rati ng formats, i.e., behaviorally-anchored and graphic rating scales. Specifically, the reliability, validity and accuracy of the performance ratings from each for mat were examined. One hundred twelve participants viewed six 5-minute videotape s of office situations and rated the performance of a target person in each vide otape on three contextual performance dimensions-Personal Support, Organizationa l Support, and Conscientious Initiative-using CARS and either behaviorally-ancho red or graphic rating scales. Performance rating properties were measured using Shrout and Fleiss's intraclass correlation (2, 1), Borman's differential accurac y measure, and Cronbach's accuracy components as indexes of rating reliability, validity, and accuracy, respectively. Results found that performance ratings mad e using the CARS were significantly more reliable and valid than performance rat ings made using either of the other formats. Additionally, CARS yielded more acc urate performance ratings than the paper-and-pencil formats. The nature of the C ARS system (i.e., its adaptive nature and scaling methodology) and its paired co mparison judgment task are offered as possible reasons for the differences found in the psychometric properties of the performance ratings made using the variou s rating formats. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aPerformance Tests10aRating Scales10aReliability10aTest10aTest Validity1 aBuck, D E uhttp://mail.iacat.org/content/examination-reliability-and-validity-performance-ratings-made-using-computerized-adaptive01173nas a2200205 4500008004100000245005600041210005300097300001200150490000700162520055300169653002200722653002500744653002500769653002200794653001500816100002300831700001800854700001500872856008000887 2000 eng d00aAn integer programming approach to item bank design0 ainteger programming approach to item bank design a139-1500 v243 aAn integer programming approach to item bank design is presented that can be used to calculate an optimal blueprint for an item bank, in order to support an existing testing program. The results are optimal in that they minimize the effort involved in producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models, which can be used as a set of monitoring tools in item bank management. The approach is demonstrated empirically for an item bank that was designed for the Law School Admission Test. 10aAptitude Measures10aItem Analysis (Test)10aItem Response Theory10aTest Construction10aTest Items1 avan der Linden, WJ1 aVeldkamp, B P1 aReese, L M uhttp://mail.iacat.org/content/integer-programming-approach-item-bank-design01775nas a2200289 4500008004100000245007700041210006900118300001400187490000700201520080000208653002501008653003101033653003701064653003801101653001901139653001001158653002701168653004601195653002001241653002801261653003201289653001801321100001401339700001701353700001501370856010001385 2000 eng d00aItem response theory and health outcomes measurement in the 21st century0 aItem response theory and health outcomes measurement in the 21st aII28-II420 v383 aItem response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.10a*Models, Statistical10aActivities of Daily Living10aData Interpretation, Statistical10aHealth Services Research/*methods10aHealth Surveys10aHuman10aMathematical Computing10aOutcome Assessment (Health Care)/*methods10aResearch Design10aSupport, Non-U.S. Gov't10aSupport, U.S. Gov't, P.H.S.10aUnited States1 aHays, R D1 aMorales, L S1 aReise, S P uhttp://mail.iacat.org/content/item-response-theory-and-health-outcomes-measurement-21st-century02719nas a2200169 4500008004100000245011500041210006900156300000900225490000700234520208500241653001302326653002802339653002602367653001602393100001602409856012402425 2000 eng d00aLagrangian relaxation for constrained curve-fitting with binary variables: Applications in educational testing0 aLagrangian relaxation for constrained curvefitting with binary v a10630 v613 aThis dissertation offers a mathematical programming approach to curve fitting with binary variables. Various Lagrangian Relaxation (LR) techniques are applied to constrained curve fitting. Applications in educational testing with respect to test assembly are utilized. In particular, techniques are applied to both static exams (i.e. conventional paper-and-pencil (P&P)) and adaptive exams (i.e. a hybrid computerized adaptive test (CAT) called a multiple-forms structure (MFS)). This dissertation focuses on the development of mathematical models to represent these test assembly problems as constrained curve-fitting problems with binary variables and solution techniques for the test development. Mathematical programming techniques are used to generate parallel test forms with item characteristics based on item response theory. A binary variable is used to represent whether or not an item is present on a form. The problem of creating a test form is modeled as a network flow problem with additional constraints. In order to meet the target information and the test characteristic curves, a Lagrangian relaxation heuristic is applied to the problem. The Lagrangian approach works by multiplying the constraint by a "Lagrange multiplier" and adding it to the objective. By systematically varying the multiplier, the test form curves approach the targets. This dissertation explores modifications to Lagrangian Relaxation as it is applied to the classical paper-and-pencil exams. For the P&P exams, LR techniques are also utilized to include additional practical constraints to the network problem, which limit the item selection. An MFS is a type of a computerized adaptive test. It is a hybrid of a standard CAT and a P&P exam. The concept of an MFS will be introduced in this dissertation, as well as, the application of LR as it is applied to constructing parallel MFSs. The approach is applied to the Law School Admission Test for the assembly of the conventional P&P test as well as an experimental computerized test using MFSs. (PsycINFO Database Record (c) 2005 APA )10aAnalysis10aEducational Measurement10aMathematical Modeling10aStatistical1 aKoppel, N B uhttp://mail.iacat.org/content/lagrangian-relaxation-constrained-curve-fitting-binary-variables-applications-educational00868nas a2200145 4500008004100000245006600041210006600107300001200173490000700185520036100192653002100553653004400574100001500618856008900633 2000 eng d00aOverview of the computerized adaptive testing special section0 aOverview of the computerized adaptive testing special section a115-1200 v213 aThis paper provides an overview of the five papers included in the Psicologica special section on computerized adaptive testing. A short introduction to this topic is presented as well. The main results, the links between the five papers and the general research topic to which they are more related are also shown. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputers computerized adaptive testing1 aPonsoda, V uhttp://mail.iacat.org/content/overview-computerized-adaptive-testing-special-section01773nas a2200265 4500008004100000245004900041210004800090300001000138490000600148520092600154653002501080653005701105653001501162653001201177653001001189653002101199653006401220653001101284653002201295653001101317653000901328653007801337100001701415856007501432 1999 eng d00aCompetency gradient for child-parent centers0 aCompetency gradient for childparent centers a35-520 v33 aThis report describes an implementation of the Rasch model during the longitudinal evaluation of a federally-funded early childhood preschool intervention program. An item bank is described for operationally defining a psychosocial construct called community life-skills competency, an expected teenage outcome of the preschool intervention. This analysis examined the position of teenage students on this scale structure, and investigated a pattern of cognitive operations necessary for students to pass community life-skills test items. Then this scale structure was correlated with nationally standardized reading and math achievement scores, teacher ratings, and school records to assess its validity as a measure of the community-related outcome goal for this intervention. The results show a functional relationship between years of early intervention and magnitude of effect on the life-skills competency variable.10a*Models, Statistical10aActivities of Daily Living/classification/psychology10aAdolescent10aChicago10aChild10aChild, Preschool10aEarly Intervention (Education)/*statistics & numerical data10aFemale10aFollow-Up Studies10aHumans10aMale10aOutcome and Process Assessment (Health Care)/*statistics & numerical data1 aBezruczko, N uhttp://mail.iacat.org/content/competency-gradient-child-parent-centers02025nas a2200289 4500008004100000245012700041210006900168300001300237490000800250520105800258653003401316653003301350653002301383653001501406653001001421653002601431653001001457653001501467653001801482653003101500100001501531700001601546700001401562700001701576700001601593856012601609 1997 eng d00aA computerized adaptive testing system for speech discrimination measurement: The Speech Sound Pattern Discrimination Test0 acomputerized adaptive testing system for speech discrimination m a2289-2980 v1013 aA computerized, adaptive test-delivery system for the measurement of speech discrimination, the Speech Sound Pattern Discrimination Test, is described and evaluated. Using a modified discrimination task, the testing system draws on a pool of 130 items spanning a broad range of difficulty to estimate an examinee's location along an underlying continuum of speech processing ability, yet does not require the examinee to possess a high level of English language proficiency. The system is driven by a mathematical measurement model which selects only test items which are appropriate in difficulty level for a given examinee, thereby individualizing the testing experience. Test items were administered to a sample of young deaf adults, and the adaptive testing system evaluated in terms of respondents' sensory and perceptual capabilities, acoustic and phonetic dimensions of speech, and theories of speech perception. Data obtained in this study support the validity, reliability, and efficiency of this test as a measure of speech processing ability.10a*Diagnosis, Computer-Assisted10a*Speech Discrimination Tests10a*Speech Perception10aAdolescent10aAdult10aAudiometry, Pure-Tone10aHuman10aMiddle Age10aPsychometrics10aReproducibility of Results1 aBochner, J1 aGarrison, W1 aPalmer, L1 aMacKenzie, D1 aBraveman, A uhttp://mail.iacat.org/content/computerized-adaptive-testing-system-speech-discrimination-measurement-speech-sound-pattern01734nas a2200169 4500008004100000245009900041210006900140300001200209490000700221520112300228653002101351653003001372653000801402653002301410100001601433856011501449 1997 eng d00aThe distribution of indexes of person fit within the computerized adaptive testing environment0 adistribution of indexes of person fit within the computerized ad a115-1270 v213 aThe extent to which a trait estimate represents the underlying latent trait of interest can be estimated by using indexes of person fit. Several statistical methods for indexing person fit have been proposed to identify nonmodel-fitting response vectors. These person-fit indexes have generally been found to follow a standard normal distribution for conventionally administered tests. The present investigation found that within the context of computerized adaptive testing (CAT) these indexes tended not to follow a standard normal distribution. As the item pool became less discriminating, as the CAT termination criterion became less stringent, and as the number of items in the pool decreased, the distributions of the indexes approached a standard normal distribution. It was determined that under these conditions the indexes' distributions approached standard normal distributions because more items were being administered. However, even when over 50 items were administered in a CAT the indexes were distributed in a fashion that was different from what was expected. (PsycINFO Database Record (c) 2006 APA )10aAdaptive Testing10aComputer Assisted Testing10aFit10aPerson Environment1 aNering, M L uhttp://mail.iacat.org/content/distribution-indexes-person-fit-within-computerized-adaptive-testing-environment01319nas a2200265 4500008004100000245005500041210005400096300001200150490000600162520053000168653003800698653002000736653001400756653003500770653003100805653001100836653001900847653001800866653002800884100001300912700001500925700001700940700001400957856008200971 1997 eng d00aOn-line performance assessment using rating scales0 aOnline performance assessment using rating scales a173-1910 v13 aThe purpose of this paper is to report on the development of the on-line performance assessment instrument--the Assessment of Motor and Process Skills (AMPS). Issues that will be addressed in the paper include: (a) the establishment of the scoring rubric and its implementation in an extended Rasch model, (b) training of raters, (c) validation of the scoring rubric and procedures for monitoring the internal consistency of raters, and (d) technological implementation of the assessment instrument in a computerized program.10a*Outcome Assessment (Health Care)10a*Rehabilitation10a*Software10a*Task Performance and Analysis10aActivities of Daily Living10aHumans10aMicrocomputers10aPsychometrics10aPsychomotor Performance1 aStahl, J1 aShumway, R1 aBergstrom, B1 aFisher, A uhttp://mail.iacat.org/content/line-performance-assessment-using-rating-scales01698nas a2200229 4500008004100000020004100041245006400082210006200146250001500208260000800223300001100231490000700242520102000249653003101269653002501300653001001325653001101335653001101346653000901357100001601366856008601382 1995 jpn d a0021-5236 (Print)0021-5236 (Linking)00aA study of psychologically optimal level of item difficulty0 astudy of psychologically optimal level of item difficulty a1995/02/01 cFeb a446-530 v653 aFor the purpose of selecting items in a test, this study presented a viewpoint of psychologically optimal difficulty level, as well as measurement efficiency, of items. A paper-and-pencil test (P & P) composed of hard, moderate and easy subtests was administered to 298 students at a university. A computerized adaptive test (CAT) was also administered to 79 students. The items of both tests were selected from Shiba's Word Meaning Comprehension Test, for which the estimates of parameters of two-parameter item response model were available. The results of P & P research showed that the psychologically optimal success level would be such that the proportion of right answers is somewhere between .75 and .85. A similar result was obtained from CAT research, where the proportion of about .8 might be desirable. Traditionally a success rate of .5 has been recommended in adaptive testing. In this study, however, it was suggested that the items of such level would be too hard psychologically for many examinees.10a*Adaptation, Psychological10a*Psychological Tests10aAdult10aFemale10aHumans10aMale1 aFujimori, S uhttp://mail.iacat.org/content/study-psychologically-optimal-level-item-difficulty00758nas a2200241 4500008004100000020002200041245006700063210006400130250001500194260000800209300001100217490000700228653001500235653002600250653003700276653002300313653001800336100002100354700001400375700002400389700001400413856008900427 1993 eng d a0744-6314 (Print)00aMoving in a new direction: Computerized adaptive testing (CAT)0 aMoving in a new direction Computerized adaptive testing CAT a1993/01/01 cJan a80, 820 v2410a*Computers10aAccreditation/methods10aEducational Measurement/*methods10aLicensure, Nursing10aUnited States1 aJones-Dickson, C1 aDorsey, D1 aCampbell-Warnock, J1 aFields, F uhttp://mail.iacat.org/content/moving-new-direction-computerized-adaptive-testing-cat