%0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T An Extended Taxonomy of Variants of Computerized Adaptive Testing %A Roy Levy %A John T. Behrens %A Robert J. Mislevy %K Adaptive Testing %K evidence-centered design %K Item Response Theory %K knowledge-based model construction %K missingness %B Journal of Computerized Adaptive Testing %V 10 %G English %N 1 %R 10.7333/2302-100101 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T New Efficient and Practicable Adaptive Designs for Calibrating Items Online %A Yinhong He %A Ping Chen %A Yong Li %X When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Thus, D-VR design still has room for improvement in calibration efficiency from the view of traditional optimal design. To this end, this article incorporates the idea of traditional optimal design into D-VR design and proposes a new online calibration design criterion, namely, excellence degree (ED) criterion. Four different schemes are developed to measure the information provided by the current examinee when implementing this new criterion, and four new ED designs equipped with them are put forward accordingly. Simulation studies were conducted under a variety of conditions to compare the D-VR design and the four proposed ED designs in terms of calibration efficiency. Results showed that the four ED designs outperformed D-VR design in almost all simulation conditions. %B Applied Psychological Measurement %V 44 %P 3-16 %U https://doi.org/10.1177/0146621618824854 %R 10.1177/0146621618824854 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation %A Albano, Anthony D. %A Cai, Liuhan %A Lease, Erin M. %A McConnell, Scott R. %X Abstract Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20-item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. %B Journal of Educational Measurement %V 56 %P 437-451 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12215 %R 10.1111/jedm.12215 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model %A Audrey J. Leroux %A J. Kay Waid-Ebbs %A Pey-Shan Wen %A Drew A. Helmer %A David P. Graham %A Maureen K. O’Connor %A Kathleen Ray %X The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. The two variable-length stopping rules included were the SE and predicted standard error reduction (PSER), along with three item pools of varied sizes (43, 86, and 172 items). Descriptive statistics on number of nonconvergent cases, measurement precision, testing burden, item overlap, item exposure, and pool utilization were calculated. Results revealed that the PSER stopping rule administered fewer items on average while maintaining measurement precision similar to the SE stopping rule across the different item pool sizes and exposure controls. The PR-SE exposure control procedure surpassed the randomesque methods by further reducing test overlap, maintaining maximum exposure rates at the target rate or lower, and utilizing all items from the pool with a minimal increase in number of items administered and nonconvergent cases. %B Applied Psychological Measurement %V 43 %P 624-638 %U https://doi.org/10.1177/0146621618824856 %R 10.1177/0146621618824856 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing %A Chuan-Ju Lin %A Hua-Hua Chang %X For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. %B Educational and Psychological Measurement %V 79 %P 335-357 %U https://doi.org/10.1177/0013164418790634 %R 10.1177/0013164418790634 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments %A Svetina, Dubravka %A Liaw, Yuan-Ling %A Rutkowski, Leslie %A Rutkowski, David %X Abstract This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s) and whether routing choices (optimal versus suboptimal routing) have an impact on achievement precision. Additionally, we examine the impact of testlet length on both person and item recovery. Overall, our results suggest that no single approach works best across the studied conditions. With respect to the mean person parameter recovery, IRT scoring (via either Fisher information or preliminary EAP estimates) outperformed classical NC methods, although differences in bias and root mean squared error were generally small. Item exposure rates were found to be more evenly distributed when suboptimal routing methods were used, and item recovery (both difficulty and discrimination) was most precisely observed for items with moderate difficulties. Based on the results of the simulation study, we draw conclusions and discuss implications for practice in the context of international large-scale assessments that recently introduced adaptive assessment in the form of MST. Future research directions are also discussed. %B Journal of Educational Measurement %V 56 %P 192-213 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12206 %R 10.1111/jedm.12206 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T A Comparison of Constraint Programming and Mixed-Integer Programming for Automated Test-Form Generation %A Li, Jie %A van der Linden, Wim J. %X Abstract The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been proposed to automate it. The goal of this article is to show how constraint programming (CP) can be used as an alternative to automate test-form generation problems with a large variety of formatting constraints, and how it compares with MIP-based form generation as for its models, solutions, and running times. Two empirical examples are presented: (i) automated generation of a computerized fixed-form; and (ii) automated generation of shadow tests for multistage testing. Both examples show that CP works well with feasible solutions and running times likely to be better than that for MIP-based applications. %B Journal of Educational Measurement %V 55 %P 435-456 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12187 %R 10.1111/jedm.12187 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt %A Cui, Zhongmin %A Liu, Chunyan %A He, Yong %A Chen, Hanwei %X Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. %B Journal of Educational Measurement %V 55 %P 582-594 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 %R 10.1111/jedm.12193 %0 Journal Article %J Educational and Psychological Measurement %D 2018 %T Item Selection Criteria With Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing %A Chuan-Ju Lin %A Hua-Hua Chang %X For item selection in cognitive diagnostic computerized adaptive testing (CD-CAT), ideally, a single item selection index should be created to simultaneously regulate precision, exposure status, and attribute balancing. For this purpose, in this study, we first proposed an attribute-balanced item selection criterion, namely, the standardized weighted deviation global discrimination index (SWDGDI), and subsequently formulated the constrained progressive index (CP\_SWDGDI) by casting the SWDGDI in a progressive algorithm. A simulation study revealed that the SWDGDI method was effective in balancing attribute coverage and the CP\_SWDGDI method was able to simultaneously balance attribute coverage and item pool usage while maintaining acceptable estimation precision. This research also demonstrates the advantage of a relatively low number of attributes in CD-CAT applications. %B Educational and Psychological Measurement %P 0013164418790634 %U https://doi.org/10.1177/0013164418790634 %R 10.1177/0013164418790634 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Latent Class Analysis of Recurrent Events in Problem-Solving Items %A Haochen Xu %A Guanhua Fang %A Yunxiao Chen %A Jingchen Liu %A Zhiliang Ying %X Computer-based assessment of complex problem-solving abilities is becoming more and more popular. In such an assessment, the entire problem-solving process of an examinee is recorded, providing detailed information about the individual, such as behavioral patterns, speed, and learning trajectory. The problem-solving processes are recorded in a computer log file which is a time-stamped documentation of events related to task completion. As opposed to cross-sectional response data from traditional tests, process data in log files are massive and irregularly structured, calling for effective exploratory data analysis methods. Motivated by a specific complex problem-solving item “Climate Control” in the 2012 Programme for International Student Assessment, the authors propose a latent class analysis approach to analyzing the events occurred in the problem-solving processes. The exploratory latent class analysis yields meaningful latent classes. Simulation studies are conducted to evaluate the proposed approach. %B Applied Psychological Measurement %V 42 %P 478-498 %U https://doi.org/10.1177/0146621617748325 %R 10.1177/0146621617748325 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis %A Liu, Shuchang %A Cai, Yan %A Tu, Dongbo %X Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. %B Journal of Educational Measurement %V 55 %P 595-613 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 %R 10.1111/jedm.12194 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T A Top-Down Approach to Designing the Computerized Adaptive Multistage Test %A Luo, Xiao %A Kim, Doyoung %X Abstract The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The design process in this approach consists of five sub-processes: (1) route mapping, (2) setting objectives, (3) setting constraints, (4) routing error control, and (5) test assembly. Results from a simulation study confirmed that the assembly, measurement and routing results of the top-down design eclipsed those of the bottom-up design. Additionally, the top-down design approach provided unique insights into design decisions that could be used to refine the test. Regardless of these advantages, it is recommended applying both top-down and bottom-up approaches in a complementary manner in practice. %B Journal of Educational Measurement %V 55 %P 243-263 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12174 %R 10.1111/jedm.12174 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Using Automatic Item Generation to Create Solutions and Rationales for Computerized Formative Testing %A Mark J. Gierl %A Hollis Lai %X Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain. %B Applied Psychological Measurement %V 42 %P 42-57 %U https://doi.org/10.1177/0146621617726788 %R 10.1177/0146621617726788 %0 Journal Article %J Applied Psychological Measurement %D 2017 %T Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test? %A Guangming Ling %A Yigal Attali %A Bridgid Finn %A Elizabeth A. Stone %X Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types when students received immediate feedback. In addition, these effects were not related to ability level, as measured by the state assessment achievement levels. The possibility that test experiences in adaptive tests may not in practice be significantly different than in fixed-item tests is raised and discussed to explain the results of this and previous studies. %B Applied Psychological Measurement %V 41 %P 495-511 %U https://doi.org/10.1177/0146621617707556 %R 10.1177/0146621617707556 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T The Development of a Web-Based CAT in China %A Chongli Liang %A Danjun Wang %A Dan Zhou %A Peida Zhan %K China %K Web-Based CAT %X

Cognitive ability assessment has been widely used as the recruitment tool in hiring potential employees. Traditional cognitive ability tests have been encountering threats from item-exposures and long time for answering. Especially in China, campus recruitment thinks highly of short answering time and anti-cheating. Beisen, as the biggest native online assessment software provider, developed a web-based CAT for cognitive ability which assessing verbal, quantitative, logical and spatial ability in order to decrease answering times, improve assessment accuracy and reduce threats from cheating and faking in online ability test. The web-based test provides convenient testing for examinees who can access easily to the test via internet just by login the test website at any time and any place through any Internet-enabled devices (e.g., laptops, IPADs, and smart phones).

We designed the CAT following strategies of establishing item bank, setting starting point, item selection, scoring and terminating. Additionally, we pay close attention to administrating the test via web. For the CAT procedures, we employed online calibration for establishing a stable and expanding item bank, and integrated maximum Fisher information, α-stratified strategy and randomization for item selection and coping with item exposures. Fixed-length and variable-length strategies were combined in terminating the test. For fulfilling the fluid web-based testing, we employed cloud computing techniques and designed each computing process subtly. Distributed computation was used to process scoring which executes EAP and item selecting at high speed. Caching all items to the servers in advance helps shortening the process of loading items to examinees’ terminal equipment. Horizontally scalable cloud servers function coping with great concurrency. The massive computation in item selecting was conversed to searching items from an information matrix table.

We examined the average accuracy, bank usage and computing performance in the condition of laboratory and real testing. According to a test for almost 28000 examinees, we found that bank usage is averagely 50%, and that 80% tests terminate at test information of 10 and averagely at 9.6. In context of great concurrency, the testing is unhindered and the process of scoring and item selection only takes averagely 0.23s for each examiner.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T An Empirical Simulation Study Using mstR for MST Designs %A Soo Lee %K mstR %K multistage testing %X

Unlike other systems of adaptive testing, multistage testing (MST) provides many benefits of adaptive testing and linear testing, and has become the most sought-after form for computerized testing in educational assessment recently. It is greatly fit for testing educational achievement and can be adapted to practical educational surveys testing. However, there are many practical considerations for MST design for operational implementations including costs and benefits. As a practitioner, we need to start with various simulations to evaluate the various MST designs and their performances before the implementations. A recently developed statistical tool mstR, an open source R package, was released to support the researchers and practitioners to aid their MST simulations for implementations.

Conventional MST design has three stages of module (i.e., 1-2-3 design) structure. Alternatively, the composition of modules diverges from one design to another (e.g., 1-3 design). For advance planning of equivalence studies, this paper utilizes both 1-2-3 design and 1-3 design for the MST structures. In order to study the broad structure of these values, this paper evaluates the different MST designs through simulations using the R package mstR. The empirical simulation study provides an introductory overview of mstR and describes what mstR offers using different MST structures from 2PL item bank. Further comparisons will show the advantages of the different MST designs (e.g., 1-2-3 design and 1-3 design) for different practical implementations.

As an open-source statistical environment R, mstR provides a great simulation tool and allows psychologists, social scientists, and educational measurement scientists to apply it to innovative future assessments in the operational use of MST.

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T An Imputation Approach to Handling Incomplete Computerized Tests %A Troy Chen %A Chi-Yu Huang %A Chunyan Liu %K CAT %K imputation approach %K incomplete computerized test %X

As technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability.  Nevertheless, examinees might devise testing strategies to use CAT to their advantage.  For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.

The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ).  Next, the expected number correct score (EXR) was obtained using  and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on   and the item parameters ( ) of an unanswered item, the likelihood of a correct response,  , is computed and employed to estimate the imputed score ( ) for the unanswered item.

Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%.  The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T).  Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).

The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine.  To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Multi-stage Testing for a Multi-disciplined End-of primary-school Test %A Hendrik Straat %A Maaike van Groen %A Wobbe Zijlstra %A Marie-Anne Keizer-Mittelhaëuser %A Michel Lamoré %K mst %K Multidisciplined %K proficiency %X

The Dutch secondary education system consists of five levels: basic, lower, and middle vocational education, general secondary education, and pre-academic education. The individual decision for level of secondary education is based on a combination of the teacher’s judgment and an end-of-primaryschool placement test.

This placement test encompasses the measurement of reading, language, mathematics and writing; each skill consisting of one to four subdomains. The Dutch end-of-primaryschool test is currently administered in two linear 200-item paper-based versions. The two versions differ in difficulty so as to motivate both less able and more able students, and measure both groups of students precisely. The primary goal of the test is providing a placement advice for five levels of secondary education. The secondary goal is the assessment of six different fundamental reference levels defined on reading, language, and mathematics. Because of the high stakes advice of the test, the Dutch parliament has instructed to change the format to a multistage test. A major advantage of multistage testing is that the tailoring of the tests is more strongly related to the ability of the students than to the teacher’s judgment. A separate multistage test is under development for each of the three skills measured by the reference levels to increase the classification accuracy for secondary education placement and to optimally measure the performance on the reference-level-related skills.

This symposium consists of three presentations discussing the challenges in transitioning from a linear paper-based test to a computer-based multistage test within an existing curriculum and the specification of the multistage test to meet the measurement purposes. The transitioning to a multistage test has to improve both classification accuracy and measurement precision.

First, we describe the Dutch educational system and the role of the end-of-primary-school placement test within this system. Special attention will be paid to the advantages of multistage testing over both linear testing and computerized adaptive testing, and on practical implications related to the transitioning from a linear to a multistage test.

Second, we discuss routing and reporting on the new multi-stage test. Both topics have a major impact on the quality of the placement advice and the reference mastery decisions. Several methods for routing and reporting are compared.

Third, the linear test contains 200 items to cover a broad range of different skills and to obtain a precise measurement of those skills separately. Multistage testing creates opportunities to reduce the cognitive burden for the students while maintaining the same quality of placement advice and assessment of mastering of reference levels. This presentation focuses on optimal allocation of items to test modules, optimal number of stages and modules per stage and test length reduction.

Session Video 1

Session Video 2

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1C5ys178p_Wl9eemQuIsI56IxDTck2z8P %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Challenges (With Solutions) and Innovative Applications of CAT %A Chun Wang %A David J. Weiss %A Xue Zhang %A Jian Tao %A Yinhong He %A Ping Chen %A Shiyu Wang %A Susu Zhang %A Haiyan Lin %A Xiaohong Gao %A Hua-Hua Chang %A Zhuoran Shang %K CAT %K challenges %K innovative applications %X

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa %0 Journal Article %J Applied Psychological MeasurementApplied Psychological Measurement %D 2017 %T Projection-Based Stopping Rules for Computerized Adaptive Testing in Licensure Testing %A Luo, Xiao %A Kim, Doyoung %A Dickison, Philip %X The confidence interval (CI) stopping rule is commonly used in licensure settings to make classification decisions with fewer items in computerized adaptive testing (CAT). However, it tends to be less efficient in the near-cut regions of the ? scale, as the CI often fails to be narrow enough for an early termination decision prior to reaching the maximum test length. To solve this problem, this study proposed the projection-based stopping rules that base the termination decisions on the algorithmically projected range of the final ? estimate at the hypothetical completion of the CAT. A simulation study and an empirical study were conducted to show the advantages of the projection-based rules over the CI rule, in which the projection-based rules reduced the test length without jeopardizing critical psychometric qualities of the test, such as the ? and classification precision. Operationally, these rules do not require additional regularization parameters, because the projection is simply a hypothetical extension of the current test within the existing CAT environment. Because these new rules are specifically designed to address the decreased efficiency in the near-cut regions as opposed to for the entire scale, the authors recommend using them in conjunction with the CI rule in practice. %B Applied Psychological MeasurementApplied Psychological Measurement %V 42 %P 275 - 290 %8 2018/06/01 %@ 0146-6216 %U https://doi.org/10.1177/0146621617726790 %N 4 %! Applied Psychological Measurement %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Scripted On-the-fly Multistage Testing %A Edison Choe %A Bruce Williams %A Sung-Hyuck Lee %K CAT %K multistage testing %K On-the-fly testing %X

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing: Based on the RRUM %A Dai, Buyun %A Zhang, Minqiang %A Li, Guangming %X Cognitive diagnostic computerized adaptive testing (CD-CAT) can be divided into two broad categories: (a) single-purpose tests, which are based on the subject’s knowledge state (KS) alone, and (b) dual-purpose tests, which are based on both the subject’s KS and traditional ability level ( ). This article seeks to identify the most efficient item selection method for the latter type of CD-CAT corresponding to various conditions and various evaluation criteria, respectively, based on the reduced reparameterized unified model (RRUM) and the two-parameter logistic model of item response theory (IRT-2PLM). The Shannon entropy (SHE) and Fisher information methods were combined to produce a new synthetic item selection index, that is, the “dapperness with information (DWI)” index, which concurrently considers both KS and within one step. The new method was compared with four other methods. The results showed that, in most conditions, the new method exhibited the best performance in terms of KS estimation and the second-best performance in terms of estimation. Item utilization uniformity and computing time are also considered for all the competing methods. %B Applied Psychological Measurement %V 40 %P 625-640 %U http://apm.sagepub.com/content/40/8/625.abstract %R 10.1177/0146621616666008 %0 Journal Article %J Journal of Educational Measurement %D 2016 %T Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design %A Wang, Shiyu %A Lin, Haiyan %A Chang, Hua-Hua %A Douglas, Jeff %X Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. %B Journal of Educational Measurement %V 53 %P 45–62 %U http://dx.doi.org/10.1111/jedm.12100 %R 10.1111/jedm.12100 %0 Journal Article %J Journal of Educational Measurement %D 2016 %T Monitoring Items in Real Time to Enhance CAT Security %A Zhang, Jinming %A Li, Jie %X An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed CTT-based procedure through simulation studies. The results show that when the total number of examinees is fixed both procedures can control the rate of type I errors at any reasonable significance level by choosing an appropriate cutoff point and meanwhile maintain a low rate of type II errors. Further, the IRT-based method has a much lower type II error rate or more power than the CTT-based method when the number of compromised items is small (e.g., 5), which can be achieved if the IRT-based procedure can be applied in an active mode in the sense that flagged items can be replaced with new items. %B Journal of Educational Measurement %V 53 %P 131–151 %U http://dx.doi.org/10.1111/jedm.12104 %R 10.1111/jedm.12104 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Optimal Reassembly of Shadow Tests in CAT %A Choi, Seung W. %A Moellering, Karin T. %A Li, Jie %A van der Linden, Wim J. %X Even in the age of abundant and fast computing resources, concurrency requirements for large-scale online testing programs still put an uninterrupted delivery of computer-adaptive tests at risk. In this study, to increase the concurrency for operational programs that use the shadow-test approach to adaptive testing, we explored various strategies aiming for reducing the number of reassembled shadow tests without compromising the measurement quality. Strategies requiring fixed intervals between reassemblies, a certain minimal change in the interim ability estimate since the last assembly before triggering a reassembly, and a hybrid of the two strategies yielded substantial reductions in the number of reassemblies without degradation in the measurement accuracy. The strategies effectively prevented unnecessary reassemblies due to adapting to the noise in the early test stages. They also highlighted the practicality of the shadow-test approach by minimizing the computational load involved in its use of mixed-integer programming. %B Applied Psychological Measurement %V 40 %P 469-485 %U http://apm.sagepub.com/content/40/7/469.abstract %R 10.1177/0146621616654597 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing %A Cheng, Ying %A Liu, Cheng %X In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed. %B Applied Psychological Measurement %V 39 %P 551-565 %U http://apm.sagepub.com/content/39/7/551.abstract %R 10.1177/0146621615585850 %0 Journal Article %J Educational and Psychological Measurement %D 2015 %T Investigation of Response Changes in the GRE Revised General Test %A Liu, Ou Lydia %A Bridgeman, Brent %A Gu, Lixiong %A Xu, Jun %A Kong, Nan %X Research on examinees’ response changes on multiple-choice tests over the past 80 years has yielded some consistent findings, including that most examinees make score gains by changing answers. This study expands the research on response changes by focusing on a high-stakes admissions test—the Verbal Reasoning and Quantitative Reasoning measures of the GRE revised General Test. We analyzed data from 8,538 examinees for Quantitative and 9,140 for Verbal sections who took the GRE revised General Test in 12 countries. The analyses yielded findings consistent with prior research. In addition, as examinees’ ability increases, the benefit of response changing increases. The study yielded significant implications for both test agencies and test takers. Computer adaptive tests often do not allow the test takers to review and revise. Findings from this study confirm the benefit of such features. %B Educational and Psychological Measurement %V 75 %P 1002-1020 %U http://epm.sagepub.com/content/75/6/1002.abstract %R 10.1177/0013164415573988 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T Online Item Calibration for Q-Matrix in CD-CAT %A Chen, Yunxiao %A Liu, Jingchen %A Ying, Zhiliang %X

Item replenishment is important for maintaining a large-scale item bank. In this article, the authors consider calibrating new items based on pre-calibrated operational items under the deterministic inputs, noisy-and-gate model, the specification of which includes the so-called -matrix, as well as the slipping and guessing parameters. Making use of the maximum likelihood and Bayesian estimators for the latent knowledge states, the authors propose two methods for the calibration. These methods are applicable to both traditional paper–pencil–based tests, for which the selection of operational items is prefixed, and computerized adaptive tests, for which the selection of operational items is sequential and random. Extensive simulations are done to assess and to compare the performance of these approaches. Extensions to other diagnostic classification models are also discussed.

%B Applied Psychological Measurement %V 39 %P 5-15 %U http://apm.sagepub.com/content/39/1/5.abstract %R 10.1177/0146621613513065 %0 Journal Article %J International Journal of Testing %D 2015 %T Using Out-of-Level Items in Computerized Adaptive Testing %A Wei,H. %A Lin,J. %X Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate feedback is recommended. This study made use of a commercial item bank with vertically scaled items across grades and simulated student responses in a computerized adaptive testing (CAT) environment. Results of the study suggested that administration of out-of-level items improved measurement accuracy and test efficiency for students who perform significantly above or below their grade-level peers. This study has direct implications with regards to the relevance, applicability, and benefits of using out-of-level items in CAT. %B International Journal of Testing %V 15 %N 1 %R http://dx.doi.org/10.1080/15305058.2014.979492 %0 Book %D 2014 %T Computerized multistage testing: Theory and applications %A Duanli Yan %A Alina A von Davier %A Charles Lewis %I CRC Press %C Boca Raton FL %@ 13-978-1-4665-0577-3 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2014 %T General Test Overlap Control: Improved Algorithm for CAT and CCT %A Chen, Shu-Ying %A Lei, Pui-Wa %A Chen, Jyun-Hong %A Liu, Tzu-Chen %X

This article proposed a new online test overlap control algorithm that is an improvement of Chen’s algorithm in controlling general test overlap rate for item pooling among a group of examinees. Chen’s algorithm is not very efficient in that not only item pooling between current examinee and prior examinees is controlled for but also item pooling between previous examinees, which would have been controlled for when they were current examinees. The proposed improvement increases efficiency by only considering item pooling between current and previous examinees, and its improved performance over Chen is demonstrated in a simulated computerized adaptive testing (CAT) environment. Moreover, the proposed algorithm is adapted for computerized classification testing (CCT) using the sequential probability ratio test procedure and is evaluated against some existing exposure control procedures. The proposed algorithm appears to work best in controlling general test overlap rate among the exposure control procedures examined without sacrificing much classification precision, though longer tests might be required for more stringent control of item pooling among larger groups. Given the capability of the proposed algorithm in controlling item pooling among a group of examinees of any size and its ease of implementation, it appears to be a good test overlap control method.

%B Applied Psychological Measurement %V 38 %P 229-244 %U http://apm.sagepub.com/content/38/3/229.abstract %R 10.1177/0146621613513494 %0 Journal Article %J Educational and Psychological Measurement %D 2013 %T A Comparison of Exposure Control Procedures in CATs Using the 3PL Model %A Leroux, Audrey J. %A Lopez, Myriam %A Hembry, Ian %A Dodd, Barbara G. %X

This study compares the progressive-restricted standard error (PR-SE) exposure control procedure to three commonly used procedures in computerized adaptive testing, the randomesque, Sympson–Hetter (SH), and no exposure control methods. The performance of these four procedures is evaluated using the three-parameter logistic model under the manipulated conditions of item pool size (small vs. large) and stopping rules (fixed-length vs. variable-length). PR-SE provides the advantage of similar constraints to SH, without the need for a preceding simulation study to execute it. Overall for the large and small item banks, the PR-SE method administered almost all of the items from the item pool, whereas the other procedures administered about 52% or less of the large item bank and 80% or less of the small item bank. The PR-SE yielded the smallest amount of item overlap between tests across conditions and administered fewer items on average than SH. PR-SE obtained these results with similar, and acceptable, measurement precision compared to the other exposure control procedures while vastly improving on item pool usage.

%B Educational and Psychological Measurement %V 73 %P 857-874 %U http://epm.sagepub.com/content/73/5/857.abstract %R 10.1177/0013164413486802 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Estimating Measurement Precision in Reduced-Length Multi-Stage Adaptive Testing %A Crotts, K.M. %A Zenisky, A. L. %A Sireci, S.G. %A Li, X. %B Journal of Computerized Adaptive Testing %V 1 %P 67-87 %G English %N 4 %R 10.7333/1309-0104067 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Integrating Test-Form Formatting Into Automated Test Assembly %A Diao, Qi %A van der Linden, Wim J. %X

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using a simultaneous optimization model is more attractive than any of the current, more time-consuming two-stage processes. The goal of this study was to provide such simultaneous models both for computer-delivered and paper forms, as well as explore their performances relative to two-stage optimization. Empirical examples are presented to show that it is possible to automatically produce fully formatted optimal test forms directly from item pools up to some 2,000 items on a regular PC in realistic times.

%B Applied Psychological Measurement %V 37 %P 361-374 %U http://apm.sagepub.com/content/37/5/361.abstract %R 10.1177/0146621613476157 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Item Ordering in Stochastically Curtailed Health Questionnaires With an Observable Outcome %A Finkelman, M. D. %A Kim, W. %A He, Y. %A Lai, A.M. %B Journal of Computerized Adaptive Testing %V 1 %P 38-66 %G en %N 3 %R 10.7333/1304-0103038 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing %A Wang, Wen-Chung %A Liu, Chen-Wei %A Wu, Shiu-Lien %X

The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs Sampler) freeware, which adopts a Bayesian approach for estimation. A series of simulations was conducted to evaluate the parameter recovery of the new model and the consequences of ignoring the randomness in thresholds. The results showed that the parameters of RTGUM were recovered fairly well and that ignoring the randomness in thresholds led to biased estimates. Computerized adaptive testing was also implemented on RTGUM, where the Fisher information criterion was used for item selection and the maximum a posteriori method was used for ability estimation. The simulation study showed that the longer the test length, the smaller the randomness in thresholds, and the more categories in an item, the more precise the ability estimates would be.

%B Applied Psychological Measurement %V 37 %P 179-200 %U http://apm.sagepub.com/content/37/3/179.abstract %R 10.1177/0146621612469720 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2013 %T Speededness and Adaptive Testing %A van der Linden, Wim J. %A Xiong, Xinhui %X

Two simple constraints on the item parameters in a response–time model are proposed to control the speededness of an adaptive test. As the constraints are additive, they can easily be included in the constraint set for a shadow-test approach (STA) to adaptive testing. Alternatively, a simple heuristic is presented to control speededness in plain adaptive testing without any constraints. Both types of control are easy to implement and do not require any other real-time parameter estimation during the test than the regular update of the test taker’s ability estimate. Evaluation of the two approaches using simulated adaptive testing showed that the STA was especially effective. It guaranteed testing times that differed less than 10 seconds from a reference test across a variety of conditions.

%B Journal of Educational and Behavioral Statistics %V 38 %P 418-438 %U http://jeb.sagepub.com/cgi/content/abstract/38/4/418 %R 10.3102/1076998612466143 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Comparison Between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test %A Jiao, H. %A Liu, J. %A Haynie, K. %A Woo, A. %A Gorham, J. %X

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test settings were explored in one real data analysis and two simulation studies when two different polytomous scoring algorithms, automated polytomous scoring and rater-generated polytomous scoring, were applied. For the real data analyses, the ability estimates from dichotomous and polytomous scoring were highly correlated; the classification consistency between different scoring algorithms was nearly perfect. Information distribution changed slightly in the operational item bank. In the two simulation studies comparing each polytomous scoring with dichotomous scoring, the ability estimates resulting from polytomous scoring had slightly higher measurement precision than those resulting from dichotomous scoring. The practical impact related to classification decision was minor because of the extremely small number of items that could be scored polytomously in this current study.

%B Educational and Psychological Measurement %V 72 %P 493-509 %G eng %R 10.1177/0013164411422903 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Comparison of Exposure Controls, Item Pool Characteristics, and Population Distributions for CAT Using the Partial Credit Model %A Lee, HwaYoung %A Dodd, Barbara G. %X

This study investigated item exposure control procedures under various combinations of item pool characteristics and ability distributions in computerized adaptive testing based on the partial credit model. Three variables were manipulated: item pool characteristics (120 items for each of easy, medium, and hard item pools), two ability distributions (normally distributed and negatively skewed data), and three exposure control procedures (randomesque procedure, progressive–restricted procedure, and maximum information procedure). A number of measurement precision indexes such as descriptive statistics, correlations between known and estimated ability levels, bias, root mean squared error, and average absolute difference, exposure rates, item usage, and item overlap were computed to assess the impact of matched or nonmatched item pool and ability distributions on the accuracy of ability estimation and the performance of exposure control procedures. As expected, the medium item pool produced better precision of measurement than both the easy and hard item pools. The progressive–restricted procedure performed better in terms of maximum exposure rates, item average overlap, and pool utilization than both the randomesque procedure and the maximum information procedure. The easy item pool with the negatively skewed data as a mismatched condition produced the worst performance.

%B Educational and Psychological Measurement %V 72 %P 159-175 %U http://epm.sagepub.com/content/72/1/159.abstract %R 10.1177/0013164411411296 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing %A Yen, Yung-Chin %A Ho, Rong-Guey %A Laio, Wen-Wei %A Chen, Li-Ju %A Kuo, Ching-Chin %X

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

%B Applied Psychological Measurement %V 36 %P 75-87 %U http://apm.sagepub.com/content/36/2/75.abstract %R 10.1177/0146621611432862 %0 Journal Article %J Journal of Educational Measurement %D 2012 %T Investigating the Effect of Item Position in Computer-Based Tests %A Li, Feiming %A Cohen, Allan %A Shen, Linjun %X

Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to data from a CBT administration of a medical licensure examination. The models specifically were used to investigate whether item position affected item difficulty and item intensity estimates. Results indicated that the position effect was negligible.

%B Journal of Educational Measurement %V 49 %P 362–379 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00181.x %R 10.1111/j.1745-3984.2012.00181.x %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Mixture Rasch Model–Based Computerized Adaptive Test for Latent Class Identification %A Hong Jiao, %A Macready, George %A Liu, Junhui %A Cho, Youngmi %X

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was large, all item selection methods did not differ evidently in terms of accuracy in classifying examinees into different latent classes and estimating latent ability. However, when item separation was small, two methods with class-specific ability estimates performed better than the other two methods based on a single latent ability estimate across all latent classes. The three types of KL information distributions were compared. The KL and the reversed KL information could be the same or different depending on the ability level and the item difficulty difference between latent classes. Although the KL information and the reversed KL information were different at some ability levels and item difficulty difference levels, the use of the KL, the reversed KL, or the adaptive KL information did not affect the results substantially due to the symmetric distribution of item difficulty differences between latent classes in the simulated item pools. Item pool usage and classification convergence points were examined as well.

%B Applied Psychological Measurement %V 36 %P 469-493 %U http://apm.sagepub.com/content/36/6/469.abstract %R 10.1177/0146621612450068 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Journal Article %J Educational and Psychological Measurement %D 2011 %T Computerized Classification Testing Under the Generalized Graded Unfolding Model %A Wang, Wen-Chung %A Liu, Chen-Wei %X

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree—disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut score/sequential probability ratio test method and (b) evaluated their accuracy and efficiency in classification through simulations. The results indicated that both methods were very accurate and efficient. The more points each item had and the fewer the classification categories, the more accurate and efficient the classification would be. However, the latter method may yield a very low accuracy in dichotomous items with a short maximum test length. Thus, if it is to be used to classify examinees with dichotomous items, the maximum text length should be increased.

%B Educational and Psychological Measurement %V 71 %P 114-128 %U http://epm.sagepub.com/content/71/1/114.abstract %R 10.1177/0013164410391575 %0 Journal Article %J Educational and Psychological Measurement %D 2011 %T Item Selection Criteria With Practical Constraints for Computerized Classification Testing %A Lin, Chuan-Ju %X

This study compares four item selection criteria for a two-category computerized classification testing: (1) Fisher information (FI), (2) Kullback—Leibler information (KLI), (3) weighted log-odds ratio (WLOR), and (4) mutual information (MI), with respect to the efficiency and accuracy of classification decision using the sequential probability ratio test as well as the extent of item usage. The comparability of the four item selection criteria are examined primarily under three types of item selection conditions: (1) using only the four item selection algorithms, (2) using the four item selection algorithms and content balancing control, and (3) using the four item selection algorithms, content balancing control, and item exposure control. The comparability of the four item selection criteria is also evaluated in two types of proficiency distribution and three levels of indifference region width. The results show that the differences of the four item selection criteria are washed out as more realistic constraints are imposed. Moreover, within two-category classification testing, the use of MI does not necessarily generate greater efficiency than FI, WLOR, and KLI, although MI might seem attractive for its general form of formula in item selection.

%B Educational and Psychological Measurement %V 71 %P 20-36 %U http://epm.sagepub.com/content/71/1/20.abstract %R 10.1177/0013164410387336 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T The Use of Decision Trees for Adaptive Item Selection and Score Estimation %A Barth B. Riley %A Rodney Funk %A Michael L. Dennis %A Richard D. Lennox %A Matthew Finkelman %K adaptive item selection %K CAT %K decision tree %X

Conducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.

Conclusions

Decision tree methods were more efficient than CAT

But,...

Conclusions

CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination

Decision Trees select items that best discriminate between groups defined by the total score.

CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Multistage Testing: Issues, Designs, and Research %A Zenisky, A. L. %A Hambleton, R. K. %A Luecht, RM %B Elements of Adaptive Testing %P 355-372 %G eng %& 18 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Psychometrika %D 2010 %T Online calibration via variable length computerized adaptive testing %A Chang, Y. I. %A Lu, H. Y. %X Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration processes being proposed and discussed from both theoretical and practical perspectives. Among them, the online calibration may be one of the most cost effective processes. In this paper, under a variable length computerized adaptive testing scenario, we integrate the methods of adaptive design, sequential estimation, and measurement error models to solve online item calibration problems. The proposed sequential estimate of item parameters is shown to be strongly consistent and asymptotically normally distributed with a prechosen accuracy. Numerical results show that the proposed method is very promising in terms of both estimation accuracy and efficiency. The results of using calibrated items to estimate the latent trait levels are also reported. %B Psychometrika %V 75 %P 140-157 %@ 0033-3123 %G eng %0 Book Section %D 2009 %T Adaptive computer-based tasks under an assessment engineering paradigm %A Luecht, RM %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Computers and Education %D 2009 %T An adaptive testing system for supporting versatile educational assessment %A Huang, Y-M. %A Lin, Y-T. %A Cheng, S-C. %K Architectures for educational technology system %K Distance education and telelearning %X With the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions. %B Computers and Education %V 52 %P 53-67 %@ 0360-1315 %G eng %0 Book Section %D 2009 %T Adequacy of an item pool measuring proficiency in English language to implement a CAT procedure %A Karino, C. A. %A Costa, D. R. %A Laros, J. A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? %A Rulison, Kelly L. %A Loken, Eric %X

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

%B Applied Psychological Measurement %V 33 %P 83-101 %U http://apm.sagepub.com/content/33/2/83.abstract %R 10.1177/0146621608324023 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've fallen and I can't get up: can high-ability students recover from early mistakes in CAT? %A Rulison, K., %A Loken, E. %B Applied Psychological Measurement %V 33(2) %P 83-101 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T I've Fallen and I Can't Get Up: Can High-Ability Students Recover From Early Mistakes in CAT? %A Rulison, Kelly L. %A Loken, Eric %X

A difficult result to interpret in Computerized Adaptive Tests (CATs) occurs when an ability estimate initially drops and then ascends continuously until the test ends, suggesting that the true ability may be higher than implied by the final estimate. This study explains why this asymmetry occurs and shows that early mistakes by high-ability students can lead to considerable underestimation, even in tests with 45 items. The opposite response pattern, where low-ability students start with lucky guesses, leads to much less bias. The authors show that using Barton and Lord's four-parameter model (4PM) and a less informative prior can lower bias and root mean square error (RMSE) for high-ability students with a poor start, as the CAT algorithm ascends more quickly after initial underperformance. Results also show that the 4PM slightly outperforms a CAT in which less discriminating items are initially used. The practical implications and relevance for psychological measurement more generally are discussed.

%B Applied Psychological Measurement %V 33 %P 83-101 %U http://apm.sagepub.com/content/33/2/83.abstract %R 10.1177/0146621608324023 %0 Book Section %D 2009 %T Limiting item exposure for target difficulty ranges in a high-stakes CAT %A Li, X. %A Becker, K. %A Gorham, J. %A Woo, A. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. {PDF File, 1. %G eng %0 Book Section %D 2009 %T Optimizing item exposure control algorithms for polytomous computerized adaptive tests with restricted item banks %A Chajewski, M. %A Lewis, C. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Practical issues concerning the application of the DINA model to CAT data %A Huebner, A. %A Wang, B. %A Lee, S. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Predictive Control of Speededness in Adaptive Testing %A van der Linden, Wim J. %X

An adaptive testing method is presented that controls the speededness of a test using predictions of the test takers' response times on the candidate items in the pool. Two different types of predictions are investigated: posterior predictions given the actual response times on the items already administered and posterior predictions that use the responses on these items as an additional source of information. In a simulation study with an adaptive test modeled after a test from the Armed Services Vocational Aptitude Battery, the effectiveness of the methods in removing differential speededness from the test was evaluated.

%B Applied Psychological Measurement %V 33 %P 25-41 %U http://apm.sagepub.com/content/33/1/25.abstract %R 10.1177/0146621607314042 %0 Journal Article %J Quality of Life Research %D 2009 %T Replenishing a computerized adaptive test of patient-reported daily activity functioning %A Haley, S. M. %A Ni, P. %A Jette, A. M. %A Tao, W. %A Moed, R. %A Meyers, D. %A Ludlow, L. H. %K *Activities of Daily Living %K *Disability Evaluation %K *Questionnaires %K *User-Computer Interface %K Adult %K Aged %K Cohort Studies %K Computer-Assisted Instruction %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %X PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. %B Quality of Life Research %7 2009/03/17 %V 18 %P 461-71 %8 May %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19288222 %0 Book Section %D 2009 %T Using automatic item generation to address item demands for CAT %A Lai, H. %A Alves, C. %A Gierl, M. J. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Applied Measurement %D 2008 %T Binary items and beyond: a simulation of computer adaptive testing using the Rasch partial credit model %A Lange, R. %K *Data Interpretation, Statistical %K *User-Computer Interface %K Educational Measurement/*statistics & numerical data %K Humans %K Illinois %K Models, Statistical %X Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance. %B Journal of Applied Measurement %7 2008/01/09 %V 9 %P 81-104 %@ 1529-7713 (Print)1529-7713 (Linking) %G eng %M 18180552 %0 Journal Article %J Spine %D 2008 %T Computerized adaptive testing in back pain: Validation of the CAT-5D-QOL %A Kopec, J. A. %A Badii, M. %A McKenna, M. %A Lima, V. D. %A Sayre, E. C. %A Dvorak, M. %K *Disability Evaluation %K *Health Status Indicators %K *Quality of Life %K Adult %K Aged %K Algorithms %K Back Pain/*diagnosis/psychology %K British Columbia %K Diagnosis, Computer-Assisted/*standards %K Feasibility Studies %K Female %K Humans %K Internet %K Male %K Middle Aged %K Predictive Value of Tests %K Questionnaires/*standards %K Reproducibility of Results %X STUDY DESIGN: We have conducted an outcome instrument validation study. OBJECTIVE: Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. SUMMARY OF BACKGROUND DATA: Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). METHODS: An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. RESULTS: A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. CONCLUSION: The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden. %B Spine %7 2008/05/23 %V 33 %P 1384-90 %8 May 20 %@ 1528-1159 (Electronic)0362-2436 (Linking) %G eng %M 18496353 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2008 %T Controlling item exposure and test overlap on the fly in computerized adaptive testing %A Chen, S-Y. %A Lei, P. W. %A Liao, W. H. %K *Decision Making, Computer-Assisted %K *Models, Psychological %K Humans %X This paper proposes an on-line version of the Sympson and Hetter procedure with test overlap control (SHT) that can provide item exposure control at both the item and test levels on the fly without iterative simulations. The on-line procedure is similar to the SHT procedure in that exposure parameters are used for simultaneous control of item exposure rates and test overlap rate. The exposure parameters for the on-line procedure, however, are updated sequentially on the fly, rather than through iterative simulations conducted prior to operational computerized adaptive tests (CATs). Unlike the SHT procedure, the on-line version can control item exposure rate and test overlap rate without time-consuming iterative simulations even when item pools or examinee populations have been changed. Moreover, the on-line procedure was found to perform better than the SHT procedure in controlling item exposure and test overlap for examinees who take tests earlier. Compared with two other on-line alternatives, this proposed on-line method provided the best all-around test security control. Thus, it would be an efficient procedure for controlling item exposure and test overlap in CATs. %B British Journal of Mathematical and Statistical Psychology %7 2007/07/26 %V 61 %P 471-92 %8 Nov %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 17650362 %0 Journal Article %J International Journal of Testing %D 2008 %T Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing %A Veldkamp, Bernard P. %A van der Linden, Wim J. %B International Journal of Testing %V 8 %P 272-289 %U http://www.tandfonline.com/doi/abs/10.1080/15305050802262233 %R 10.1080/15305050802262233 %0 Journal Article %J Psychometrika %D 2008 %T Modern sequential analysis and its application to computerized adaptive testing %A Bartroff, J. %A Finkelman, M. %A Lai, T. L. %X After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing.We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests. %B Psychometrika %V 73 %P 473-486 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2008 %T A Strategy for Controlling Item Exposure in Multidimensional Computerized Adaptive Testing %A Lee, Yi-Hsuan %A Ip, Edward H. %A Fuh, Cheng-Der %X

Although computerized adaptive tests have enjoyed tremendous growth, solutions for important problems remain unavailable. One problem is the control of item exposure rate. Because adaptive algorithms are designed to select optimal items, they choose items with high discriminating power. Thus, these items are selected more often than others, leading to both overexposure and underutilization of some parts of the item pool. Overused items are often compromised, creating a security problem that could threaten the validity of a test. Building on a previously proposed stratification scheme to control the exposure rate for one-dimensional tests, the authors extend their method to multidimensional tests. A strategy is proposed based on stratification in accordance with a functional of the vector of the discrimination parameter, which can be implemented with minimal computational overhead. Both theoretical and empirical validation studies are provided. Empirical results indicate significant improvement over the commonly used method of controlling exposure rate that requires only a reasonable sacrifice in efficiency.

%B Educational and Psychological Measurement %V 68 %P 215-232 %U http://epm.sagepub.com/content/68/2/215.abstract %R 10.1177/0013164407307007 %0 Journal Article %J Disability and Rehabilitation %D 2008 %T Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running %A Velozo, C. A. %A Wang, Y. %A Lehman, L. A. %A Wang, J. H. %X Purpose.The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running.Method.Our instrument development work on the walking/climbing/running construct of the ICF Activity Measure was used to show how to develop a computer adaptive test (CAT). Fit of the items to the Rasch model and validation of the item difficulty hierarchy was accomplished using Winsteps software. Standard error was used as a stopping rule for the CAT. Finally, person abilities were connected to items difficulties using Rasch analysis ‘maps’.Results.All but the walking one mile item fit the Rasch measurement model. A CAT was developed which selectively presented items based on the last calibrated person ability measure and was designed to stop when standard error decreased to a pre-set criterion. Finally, person ability measures were connected to the ability to perform specific walking/climbing/running activities using Rasch maps.Conclusions.Rasch measurement models can be useful in developing CAT measures for rehabilitation and disability. In addition to CATs reducing respondent burden, the connection of person measures to item difficulties may be important for the clinical interpretation of measures.Read More: http://informahealthcare.com/doi/abs/10.1080/09638280701617317 %B Disability and Rehabilitation %V 30 %P 458-467 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2007 %T Computerizing Organizational Attitude Surveys %A Mueller, Karsten %A Liebig, Christian %A Hattrup, Keith %X

Two quasi-experimental field studies were conducted to evaluate the psychometric equivalence of computerized and paper-and-pencil job satisfaction measures. The present research extends previous work in the area by providing better control of common threats to validity in quasi-experimental research on test mode effects and by evaluating a more comprehensive measurement model for job attitudes. Results of both studies demonstrated substantial equivalence of the computerized measure with the paper-and-pencil version. Implications for the practical use of computerized organizational attitude surveys are discussed.

%B Educational and Psychological Measurement %V 67 %P 658-678 %U http://epm.sagepub.com/content/67/4/658.abstract %R 10.1177/0013164406292084 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2007 %T Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities %A van der Linden, Wim J. %A Veldkamp, Bernard P. %X

Two conditional versions of the exposure-control method with item-ineligibility constraints for adaptive testing in van der Linden and Veldkamp (2004) are presented. The first version is for unconstrained item selection, the second for item selection with content constraints imposed by the shadow-test approach. In both versions, the exposure rates of the items are controlled using probabilities of item ineligibility given θ that adapt the exposure rates automatically to a goal value for the items in the pool. In an extensive empirical study with an adaptive version of the Law School Admission Test, the authors show how the method can be used to drive conditional exposure rates below goal values as low as 0.025. Obviously, the price to be paid for minimal exposure rates is a decrease in the accuracy of the ability estimates. This trend is illustrated with empirical data.

%B Journal of Educational and Behavioral Statistics %V 32 %P 398-418 %U http://jeb.sagepub.com/cgi/content/abstract/32/4/398 %R 10.3102/1076998606298044 %0 Journal Article %J Journal of Educational Measurement %D 2007 %T Detecting Differential Speededness in Multistage Testing %A van der Linden, Wim J. %A Breithaupt, Krista %A Chuah, Siang Chee %A Zhang, Yanwei %X

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.

%B Journal of Educational Measurement %V 44 %P 117–130 %U http://dx.doi.org/10.1111/j.1745-3984.2007.00030.x %R 10.1111/j.1745-3984.2007.00030.x %0 Journal Article %J Quality of Life Research %D 2007 %T The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment %A Cella, D. %A Gershon, R. C. %A Lai, J-S. %A Choi, S. W. %X The use of item banks and computerized adaptive testing (CAT) begins with clear definitions of important outcomes, and references those definitions to specific questions gathered into large and well-studied pools, or “banks” of items. Items can be selected from the bank to form customized short scales, or can be administered in a sequence and length determined by a computer programmed for precision and clinical relevance. Although far from perfect, such item banks can form a common definition and understanding of human symptoms and functional problems such as fatigue, pain, depression, mobility, social function, sensory function, and many other health concepts that we can only measure by asking people directly. The support of the National Institutes of Health (NIH), as witnessed by its cooperative agreement with measurement experts through the NIH Roadmap Initiative known as PROMIS (www.nihpromis.org), is a big step in that direction. Our approach to item banking and CAT is practical; as focused on application as it is on science or theory. From a practical perspective, we frequently must decide whether to re-write and retest an item, add more items to fill gaps (often at the ceiling of the measure), re-test a bank after some modifications, or split up a bank into units that are more unidimensional, yet less clinically relevant or complete. These decisions are not easy, and yet they are rarely unforgiving. We encourage people to build practical tools that are capable of producing multiple short form measures and CAT administrations from common banks, and to further our understanding of these banks with various clinical populations and ages, so that with time the scores that emerge from these many activities begin to have not only a common metric and range, but a shared meaning and understanding across users. In this paper, we provide an overview of item banking and CAT, discuss our approach to item banking and its byproducts, describe testing options, discuss an example of CAT for fatigue, and discuss models for long term sustainability of an entity such as PROMIS. Some barriers to success include limitations in the methods themselves, controversies and disagreements across approaches, and end-user reluctance to move away from the familiar. %B Quality of Life Research %V 16 %P 133-141 %@ 0962-9343 %G eng %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Book Section %D 2007 %T Some thoughts on controlling item exposure in adaptive testing %A Lewis, C. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Pain Symptom Management %D 2007 %T A system for interactive assessment and management in palliative care %A Chang, C-H. %A Boni-Saenz, A. A. %A Durazo-Arvizu, R. A. %A DesHarnais, S. %A Lau, D. T. %A Emanuel, L. L. %K *Needs Assessment %K Humans %K Medical Informatics/*organization & administration %K Palliative Care/*organization & administration %X The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. %B Journal of Pain Symptom Management %7 2007/03/16 %V 33 %P 745-55 %@ 0885-3924 (Print) %G eng %M 17360148 %0 Journal Article %J Journal of Educational Measurement %D 2006 %T Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment %A Lei, Pui-Wa %A Chen, Shu-Ying %A Yu, Lan %X

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.

%B Journal of Educational Measurement %V 43 %P 245–264 %U http://dx.doi.org/10.1111/j.1745-3984.2006.00015.x %R 10.1111/j.1745-3984.2006.00015.x %0 Journal Article %J Journal of Educational Measurement %D 2006 %T Comparing methods of assessing differential item functioning in a computerized adaptive testing environment %A Lei, P-W. %A Chen, S-Y. %A Yu, L. %K computerized adaptive testing %K educational testing %K item response theory likelihood ratio test %K logistic regression %K trait estimation %K unidirectional & non-unidirectional differential item functioning %X Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 43 %P 245-264 %@ 0022-0655 (Print) %G eng %M 2006-10742-004 %0 Journal Article %J J Educ Eval Health Prof %D 2006 %T Estimation of an examinee's ability in the web-based computerized adaptive testing program IRT-CAT %A Lee, Y. H. %A Park, J. H. %A Park, I. Y. %X We developed a program to estimate an examinee s ability in order to provide freely available access to a web-based computerized adaptive testing (CAT) program. We used PHP and Java Script as the program languages, PostgresSQL as the database management system on an Apache web server and Linux as the operating system. A system which allows for user input and searching within inputted items and creates tests was constructed. We performed an ability estimation on each test based on a Rasch model and 2- or 3-parametric logistic models. Our system provides an algorithm for a web-based CAT, replacing previous personal computer-based ones, and makes it possible to estimate an examinee's ability immediately at the end of test. %B J Educ Eval Health Prof %7 2006/01/01 %V 3 %P 4 %@ 1975-5937 (Electronic) %G eng %M 19223996 %2 2631187 %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Journal Article %J Quality of Life Research %D 2006 %T Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue %A Lai, J-S. %A Crane, P. K. %A Cella, D. %K *Factor Analysis, Statistical %K *Quality of Life %K Aged %K Chicago %K Fatigue/*etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications %K Questionnaires %X BACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF. %B Quality of Life Research %V 15 %P 1179-90 %8 Sep %G eng %M 17001438 %0 Journal Article %J Applied Measurement in Education %D 2006 %T How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation %A Chuah, Siang Chee %A F Drasgow %A Luecht, Richard %B Applied Measurement in Education %V 19 %P 241-255 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_5 %R 10.1207/s15324818ame1903_5 %0 Journal Article %J Medical Care %D 2006 %T Item banks and their potential applications to health status assessment in diverse populations %A Hahn, E. A. %A Cella, D. %A Bode, R. K. %A Gershon, R. C. %A Lai, J. S. %X In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. %B Medical Care %V 44 %P S189-S197 %8 Nov %G eng %M 17060827 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory %A Haley, S. M. %A Ni, P. %A Ludlow, L. H. %A Fragala-Pinkham, M. A. %K *Disability Evaluation %K *Pediatrics %K Adolescent %K Child %K Child, Preschool %K Computers %K Disabled Persons/*classification/rehabilitation %K Efficiency %K Humans %K Infant %K Outcome Assessment (Health Care) %K Psychometrics %K Self Care %X OBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/29 %V 87 %P 1223-9 %8 Sep %@ 0003-9993 (Print) %G eng %M 16935059 %0 Journal Article %J Applied Measurement in Education %D 2006 %T A testlet assembly design for the uniform CPA Examination %A Luecht, Richard %A Brumfield, Terry %A Breithaupt, Krista %B Applied Measurement in Education %V 19 %P 189-202 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_2 %R 10.1207/s15324818ame1903_2 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2005 %T Computerized adaptive testing: a mixture item selection approach for constrained situations %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %K *Computer-Aided Design %K *Educational Measurement/methods %K *Models, Psychological %K Humans %K Psychometrics/methods %X In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (theta) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item-overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non-statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS-MI), integrating the MS and MI approaches, in testing with non-statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS-MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS-MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. %B British Journal of Mathematical and Statistical Psychology %7 2005/11/19 %V 58 %P 239-57 %8 Nov %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 16293199 %0 Government Document %D 2005 %T Computerizing statewide assessments in Minnesota: A report on the feasibility of converting the Minnesota Comprehensive Assessments to a computerized adaptive format %A Peterson, K.A. %A Davison. M. L. %A Hjelseth, L. %I Office of Educational Accountability, College of Education and Human Development, University of Minnesota %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling Item Exposure and Test Overlap in Computerized Adaptive Testing %A Chen, Shu-Ying %A Lei, Pui-Wa %X

This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent.

%B Applied Psychological Measurement %V 29 %P 204-217 %U http://apm.sagepub.com/content/29/3/204.abstract %R 10.1177/0146621604271495 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling item exposure and test overlap in computerized adaptive testing %A Chen, S-Y. %A Lei, P-W. %K Adaptive Testing %K Computer Assisted Testing %K Item Content (Test) computerized adaptive testing %X This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 204-217 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Controlling item exposure and test overlap in computerized adaptive testing %A Chen, S.Y. %A Lei, P. W. %B Applied Psychological Measurement %V 29(2) %P 204–217 %G eng %0 Journal Article %J Evaluation and the Health Professions %D 2005 %T Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer %A Eton, D. T. %A Lai, J. S. %A Cella, D. %A Reeve, B. B. %A Talcott, J. A. %A Clark, J. A. %A McPherson, C. P. %A Litwin, M. S. %A Moinpour, C. M. %K *Quality of Life %K *Questionnaires %K Adult %K Aged %K Data Collection/methods %K Humans %K Intestine, Large/*physiopathology %K Male %K Middle Aged %K Prostatic Neoplasms/*physiopathology %K Psychometrics %K Research Support, Non-U.S. Gov't %K Statistics, Nonparametric %X Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. %B Evaluation and the Health Professions %V 28 %P 142-59 %G eng %M 15851770 %0 Journal Article %J Journal of Educational Measurement %D 2005 %T Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests %A Li, Y. H. %A Schafer, W. D. %K algorithm %K computerized adaptive testing %K item exposure rate %K shadow test %K varied target function %X A computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CATs item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test (van der Linden & Reese, 1998) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 42 %P 245-269 %@ 0022-0655 (Print) %G eng %M 2005-10716-002 %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T An item bank was created to improve the measurement of cancer-related fatigue %A Lai, J-S. %A Cella, D. %A Dineen, K. %A Bode, R. %A Von Roenn, J. %A Gershon, R. C. %A Shevrin, D. %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Fatigue/*etiology/psychology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications/psychology %K Psychometrics %K Questionnaires %X OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. %B Journal of Clinical Epidemiology %7 2005/02/01 %V 58 %P 190-7 %8 Feb %@ 0895-4356 (Print)0895-4356 (Linking) %G eng %9 Multicenter Study %M 15680754 %0 Journal Article %J Journal of Pain and Symptom Management %D 2005 %T An item response theory-based pain item bank can enhance measurement precision %A Lai, J-S. %A Dineen, K. %A Reeve, B. B. %A Von Roenn, J. %A Shervin, D. %A McGuire, M. %A Bode, R. K. %A Paice, J. %A Cella, D. %K computerized adaptive testing %X Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. %B Journal of Pain and Symptom Management %V 30 %P 278-88 %G eng %M 16183012 %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics %A Li, Y. H. %X Under a multidimensional item response theory (MIRT) computerized adaptive testing (CAT) testing scenario, a trait estimate (θ) in onedimension will provide clues for subsequentlyseeking a solution in other dimensions. Thisfeature may enhance the efficiency of MIRT CAT’s item selection and its scoring algorithms compared with its counterpart, the unidimensional CAT (UCAT). The present study used existing Reading and Math test data to generate simulated item parameters. A confirmatory item factor analysis model was applied to the data using NOHARM to produce interpretable MIRT item parameters. Results showed that MIRT CAT, conditional on theconstraints, was quite capable of producing accurate estimates on both measures. Compared with UCAT, MIRT CAT slightly increased the accuracy of both trait estimates, especially for the low-level or high-level trait examinees in both measures, and reduced the rate of unused items in the item pool. Index terms: computerized adaptive testing (CAT), item response theory (IRT), dimensionality, 0-1 linear programming, constraints, item exposure, reading assessment, mathematics assessment. %B Applied Psychological Measurement %V 29 %P 3-25 %@ 0146-6216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Trait Parameter Recovery Using Multidimensional Computerized Adaptive Testing in Reading and Mathematics %A Li, Yuan H. %A Schafer, William D. %X

Under a multidimensional item response theory (MIRT) computerized adaptive testing (CAT) testing scenario, a trait estimate (θ) in one dimension will provide clues for subsequently seeking a solution in other dimensions. This feature may enhance the efficiency of MIRT CAT’s item selection and its scoring algorithms compared with its counterpart, the unidimensional CAT (UCAT). The present study used existing Reading and Math test data to generate simulated item parameters. A confirmatory item factor analysis model was applied to the data using NOHARM to produce interpretable MIRT item parameters. Results showed that MIRT CAT, conditional on the constraints, was quite capable of producing accurate estimates on both measures. Compared with UCAT, MIRT CAT slightly increased the accuracy of both trait estimates, especially for the low-level or high-level trait examinees in both measures, and reduced the rate of unused items in the item pool.

%B Applied Psychological Measurement %V 29 %P 3-25 %U http://apm.sagepub.com/content/29/1/3.abstract %R 10.1177/0146621604270667 %0 Journal Article %J Medical Care %D 2004 %T Activity outcome measurement for postacute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Ludlow, L. H. %A Ni, P. %A Bond, T. L. %A Sinclair, S. J. %A Jette, A. M. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aftercare/*standards/statistics & numerical data %K Aged %K Boston %K Cognition/physiology %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Human %K Male %K Middle Aged %K Movement/physiology %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Psychometrics %K Questionnaires/standards %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Support, U.S. Gov't, P.H.S. %X BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. %B Medical Care %V 42 %P I49-161 %G eng %M 14707755 %0 Journal Article %J WSEAS Transactions on Communications %D 2004 %T Adaptive exploration of user knowledge in computer based testing %A Lamboudis, D. %A Economides, A. A. %B WSEAS Transactions on Communications %V 3 (1) %P 322-327 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Adaptive Testing With Regression Trees in the Presence of Multidimensionality %A Yan, Duanli %A Lewis, Charles %A Stocking, Martha %X

It is unrealistic to suppose that standard item response theory (IRT) models will be appropriate for all the new and currently considered computer-based tests. In addition to developing new models, we also need to give attention to the possibility of constructing and analyzing new tests without the aid of strong models. Computerized adaptive testing currently relies heavily on IRT. Alternative, empirically based, nonparametric adaptive testing algorithms exist, but their properties are little known. This article introduces a nonparametric, tree-based algorithm for adaptive testing and shows that it may be superior to conventional, IRT-based adaptive testing in cases where the IRT assumptions are not satisfied. In particular, it shows that the tree-based approach clearly outperformed (one-dimensional) IRT when the pool was strongly two-dimensional.

%B Journal of Educational and Behavioral Statistics %V 29 %P 293-316 %U http://jeb.sagepub.com/cgi/content/abstract/29/3/293 %R 10.3102/10769986029003293 %0 Generic %D 2004 %T The AMC Linear Disability Score project in a population requiring residential care: psychometric properties %A Holman, R. %A Lindeboom, R. %A Vermeulen, M. %A de Haan, R. J. %K *Disability Evaluation %K *Health Status Indicators %K Activities of Daily Living/*classification %K Adult %K Aged %K Aged, 80 and over %K Data Collection/methods %K Female %K Humans %K Logistic Models %K Male %K Middle Aged %K Netherlands %K Pilot Projects %K Probability %K Psychometrics/*instrumentation %K Questionnaires/standards %K Residential Facilities/*utilization %K Severity of Illness Index %X BACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status. %B Health and Quality of Life Outcomes %7 2004/08/05 %V 2 %P 42 %8 Aug 3 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 15291958 %2 514531 %0 Book Section %D 2004 %T Computer-adaptive testing %A Luecht, RM %C B. Everett, and D. Howell (Eds.), Encyclopedia of statistics in behavioral science. New York: Wiley. %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Constraining Item Exposure in Computerized Adaptive Testing With Shadow Tests %A van der Linden, Wim J. %A Veldkamp, Bernard P. %X

Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator.

%B Journal of Educational and Behavioral Statistics %V 29 %P 273-291 %U http://jeb.sagepub.com/cgi/content/abstract/29/3/273 %R 10.3102/10769986029003273 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2004 %T The context effects of multidimensional CAT on the accuracy of multidimensional abilities and the item exposure rates %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego CA %G eng %0 Journal Article %J Computers and Education %D 2004 %T The development and evaluation of a software prototype for computer-adaptive testing %A Lilley, M %A Barker, T %A Britton, C %K computerized adaptive testing %B Computers and Education %V 43 %P 109-123 %G eng %0 Journal Article %J ReCALL %D 2004 %T Évaluation et multimédia dans l'apprentissage d'une L2 [Assessment and multimedia in learning an L2] %A Laurier, M. %K Adaptive Testing %K Computer Assisted Instruction %K Educational %K Foreign Language Learning %K Program Evaluation %K Technology computerized adaptive testing %X In the first part of this paper different areas where technology may be used for second language assessment are described. First, item banking operations, which are generally based on item Response Theory but not necessarily restricted to dichotomously scored items, facilitate assessment task organization and require technological support. Second, technology may help to design more authentic assessment tasks or may be needed in some direct testing situations. Third, the assessment environment may be more adapted and more stimulating when technology is used to give the student more control. The second part of the paper presents different functions of assessment. The monitoring function (often called formative assessment) aims at adapting the classroom activities to students and to provide continuous feedback. Technology may be used to train the teachers in monitoring techniques, to organize data or to produce diagnostic information; electronic portfolios or quizzes that are built in some educational software may also be used for monitoring. The placement function is probably the one in which the application of computer adaptive testing procedures (e.g. French CAPT) is the most appropriate. Automatic scoring devices may also be used for placement purposes. Finally the certification function requires more valid and more reliable tools. Technology may be used to enhance the testing situation (to make it more authentic) or to facilitate data processing during the construction of a test. Almond et al. (2002) propose a four component model (Selection, Presentation, Scoring and Response) for designing assessment systems. Each component must be planned taking into account the assessment function. %B ReCALL %V 16 %P 475-487 %G eng %0 Conference Proceedings %B annual meeting of the American Educational Research Association %D 2004 %T An investigation of two combination procedures of SPRT for three-category classification decisions in computerized classification test %A Jiao, H. %A Wang, S %A Lau, CA %K computerized adaptive testing %K Computerized classification testing %K sequential probability ratio testing %B annual meeting of the American Educational Research Association %C San Antonio, Texas %8 04/2004 %G eng %0 Generic %D 2004 %T An investigation of two combination procedures of SPRT for three-category decisions in computerized classification test %A Jiao, H. %A Wang, S %A Lau, A %C Paper presented at the annual meeting of the American Educational Research Association, San Diego CA %G eng %0 Book Section %B Intelligent Tutoring Systems %D 2004 %T A Learning Environment for English for Academic Purposes Based on Adaptive Tests and Task-Based Systems %A Gonçalves, Jean P. %A Aluisio, Sandra M. %A de Oliveira, Leandro H.M. %A Oliveira Jr., Osvaldo N. %E Lester, James C. %E Vicari, Rosa Maria %E Paraguaçu, Fábio %B Intelligent Tutoring Systems %S Lecture Notes in Computer Science %I Springer Berlin / Heidelberg %V 3220 %P 1-11 %@ 978-3-540-22948-3 %G eng %U http://dx.doi.org/10.1007/978-3-540-30139-4_1 %R 10.1007/978-3-540-30139-4_1 %0 Generic %D 2004 %T Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project %A Holman, R. %A Glas, C. A. %A Lindeboom, R. %A Zwinderman, A. H. %A de Haan, R. J. %K *Disability Evaluation %K *Health Surveys %K *Logistic Models %K *Questionnaires %K Activities of Daily Living/*classification %K Data Interpretation, Statistical %K Health Status %K Humans %K Pilot Projects %K Probability %K Quality of Life %K Severity of Illness Index %X BACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. %B Health and Quality of Life Outcomes %7 2004/06/18 %V 2 %P 29 %8 Jun 16 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %9 Comparative StudyResearch Support, Non-U.S. Gov't %M 15200681 %2 441407 %0 Journal Article %J Medical Care %D 2004 %T Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain %A Coster, W. J. %A Haley, S. M. %A Andres, P. L. %A Ludlow, L. H. %A Bond, T. L. %A Ni, P. S. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aged %K Aged, 80 and over %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Questionnaires/*standards %K Recovery of Function/physiology %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Research Support, U.S. Gov't, Non-P.H.S. %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. %B Medical Care %V 42 %P I62-172 %8 Jan %G eng %M 14707756 %0 Journal Article %J Metodologia de Las Ciencias del Comportamiento. %D 2004 %T Statistics for detecting disclosed items in a CAT environment %A Lu, Y., %A Hambleton, R. K. %B Metodologia de Las Ciencias del Comportamiento. %V 5 %G eng %N 2 %& págs. 225-242 %0 Journal Article %J Language Learning %D 2004 %T Testing vocabulary knowledge: Size, strength, and computer adaptiveness %A Laufer, B. %A Goldstein, Z. %X (from the journal abstract) In this article, we describe the development and trial of a bilingual computerized test of vocabulary size, the number of words the learner knows, and strength, a combination of four aspects of knowledge of meaning that are assumed to constitute a hierarchy of difficulty: passive recognition (easiest), active recognition, passive recall, and active recall (hardest). The participants were 435 learners of English as a second language. We investigated whether the above hierarchy was valid and which strength modality correlated best with classroom language performance. Results showed that the hypothesized hierarchy was present at all word frequency levels, that passive recall was the best predictor of classroom language performance, and that growth in vocabulary knowledge was different for the different strength modalities. (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Language Learning %V 54 %P 399-436 %8 Sep %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Accuracy of reading and mathematics ability estimates under the shadow-test constraint MCAT %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Book Section %D 2003 %T Adaptive exploration of assessment results under uncertainty %A Lamboudis, D. %A Economides, A. A. %A Papastergiou, A. %C Proceedings 3rd IEEE ternational Conference on Advanced Learning Technologies, ICALT '03, 460-461, 2003. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T The assembly of multiple form structures %A Armstrong, R. D. %A Little, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T A Bayesian method for the detection of item preknowledge in computerized adaptive testing %A McLeod, L. %A Lewis, C. %A Thissen, D. %K Adaptive Testing %K Cheating %K Computer Assisted Testing %K Individual Differences computerized adaptive testing %K Item %K Item Analysis (Statistical) %K Mathematical Modeling %K Response Theory %X With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 27 %P 121-137 %G eng %0 Journal Article %J Clinical Therapeutics %D 2003 %T Can an item response theory-based pain item bank enhance measurement precision? %A Lai, J-S. %A Dineen, K. %A Cella, D. %A Von Roenn, J. %B Clinical Therapeutics %V 25 %P D34-D36 %G eng %M 14568660 %! Clin Ther %0 Journal Article %J The Journal of Technology, Learning and Assessment %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %X Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items. %B The Journal of Technology, Learning and Assessment %V 2 %P 1-15 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %A Wen. Z. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing using the nearest-neighbors criterion %A Cheng, P. E. %A Liou, M. %B Applied Psychological Measurement %V 27 %P 204-216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing using the nearest-neighbors criterion %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Item Response Theory %K Statistical Analysis %K Statistical Estimation computerized adaptive testing %K Statistical Tests %X Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 204-216 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2003 %T Developing an initial physical function item bank from existing sources %A Bode, R. K. %A Cella, D. %A Lai, J. S. %A Heinemann, A. W. %K *Databases %K *Sickness Impact Profile %K Adaptation, Psychological %K Data Collection %K Humans %K Neoplasms/*physiopathology/psychology/therapy %K Psychometrics %K Quality of Life/*psychology %K Research Support, U.S. Gov't, P.H.S. %K United States %X The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. %B Journal of Applied Measurement %V 4 %P 124-36 %G eng %M 12748405 %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T The effect of item selection method on the variability of CAT’s ability estimates when item parameters are contaminated with measurement errors %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 2003 %T The effects of model misfit in computerized classification test %A Jiao, H. %A Lau, A. C. %C Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Evaluating a new approach to detect aberrant responses in CAT %A Lu, Y., %A Robin, F. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Evaluating computer-based test security by generalized item overlap rates %A Zhang, J. %A Lu, T. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating computerized adaptive testing design for the MCAT with realistic simulated data %A Lu, Y., %A Pitoniak, M. %A Rizavi, S. %A Way, W. D. %A Steffan, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Exposure control using adaptive multi-stage item bundles %A Luecht, RM %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B annual meeting of the National Council on Measurement in Education %D 2003 %T Exposure control using adaptive multi-stage item bundles %A Luecht, RM %B annual meeting of the National Council on Measurement in Education %C Chicago, IL. USA %G eng %0 Journal Article %J Journal of Technology, Learning, and Assessment %D 2003 %T A feasibility study of on-the-fly item generation in adaptive testing %A Bejar, I. I. %A Lawless, R. R., %A Morley, M. E., %A Wagner, M. E., %A Bennett R. E., %A Revuelta, J. %B Journal of Technology, Learning, and Assessment %V 2 %G eng %N 3 %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %K computerized adaptive testing %X Studied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD) %B Educational and Psychological Measurement %V 63 %P 257-70 %G eng %M EJ672406 %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation Of Content Balancing Requirements In Stratification Designs For Computerized Adaptive Testing %A Leung, Chi-Keung %A Chang, Hua-Hua %A Hau, Kit-Tai %X

In computerized adaptive testing, the multistage a-stratified design advocates a new philosophy on pool management and item selection in which, contradictory to common practice, less discriminating items are used first. The method is effective in reducing item-overlap rate and enhancing pool utilization. This stratification method has been extended in different ways to deal with the practical issues of content constraints and the positive correlation between item difficulty and discrimination. Nevertheless, these modified designs on their own do not automatically satisfy content requirements. In this study, three stratification designs were examined in conjunction with three well developed content balancing methods. The performance of each of these nine combinational methods was evaluated in terms of their item security, measurement efficiency, and pool utilization. Results showed substantial differences in item-overlap rate and pool utilization among different methods. An optimal combination of stratification design and content balancing method is recommended.

%B Educational and Psychological Measurement %V 63 %P 257-270 %U http://epm.sagepub.com/content/63/2/257.abstract %R 10.1177/0013164403251326 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Increasing the homogeneity of CAT’s item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Journal Article %J Quality of Life Research %D 2003 %T Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale %A Lai, J-S. %A Crane, P. K. %A Cella, D. %A Chang, C-H. %A Bode, R. K. %A Heinemann, A. W. %K *Health Status Indicators %K *Questionnaires %K Adult %K Fatigue/*diagnosis/etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/complications %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Sickness Impact Profile %X Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. %B Quality of Life Research %V 12 %P 485-501 %8 Aug %G eng %M 13677494 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Methods for item set selection in adaptive testing %A Lu, Y., %A Rizavi, S. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Multidimensional computerized adaptive testing in recovering reading and mathematics abilities %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago, IL %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Test information targeting strategies for adaptive multistage testlet designs %A Luecht, RM %A Burgin, W. L. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Accuracy of the ability estimate and the item exposure rate under multidimensional adaptive testing with item constraints %A Li, Y. H. %A Yu, N. Y. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 2002 %T Adaptive testing without IRT in the presence of multidimensionality (Research Report 02-09) %A Yan, D. %A Lewis, C. %A Stocking, M. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Seminars in Oncology %D 2002 %T Advances in quality of life measurements in oncology patients %A Cella, D. %A Chang, C-H. %A Lai, J. S. %A Webster, K. %K *Quality of Life %K *Sickness Impact Profile %K Cross-Cultural Comparison %K Culture %K Humans %K Language %K Neoplasms/*physiopathology %K Questionnaires %X Accurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated. %B Seminars in Oncology %V 29 %P 60-8 %8 Jun %G eng %M 12082656 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Comparing three item selection approaches for computerized adaptive testing with content balancing requirement %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T A comparison of computer mastery models when pool characteristics vary %A Smith, R. L. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J British Journal of Educational Technology %D 2002 %T Computerised adaptive testing %A Latu, E. %A Chapman, E. %K computerized adaptive testing %X Considers the potential of computer adaptive testing (CAT). Discusses the use of CAT instead of traditional paper and pencil tests, identifies decisions that impact the efficacy of CAT, and concludes that CAT is beneficial when used to its full potential on certain types of tests. (LRW) %B British Journal of Educational Technology %V 33 %P 619-22 %G eng %M EJ657892 %0 Conference Paper %B Paper presented at the 2002 Computer-Assisted Testing Conference %D 2002 %T The development and evaluation of a computer-adaptive testing application for English language %A Lilley, M %A Barker, T %B Paper presented at the 2002 Computer-Assisted Testing Conference %C United Kingdom %G eng %0 Journal Article %J Quality of Life Research %D 2002 %T Feasibility and acceptability of computerized adaptive testing (CAT) for fatigue monitoring in clinical practice %A Davis, K. M. %A Chang, C-H. %A Lai, J-S. %A Cella, D. %B Quality of Life Research %V 11(7) %P 134 %G eng %0 Generic %D 2002 %T A feasibility study of on-the-fly item generation in adaptive testing (GRE Board Report No 98-12) %A Bejar, I. I. %A Lawless, R. R %A Morley, M. E %A Wagner, M. E. %A Bennett, R. E. %A Revuelta, J. %C Educational Testing Service RR02-23. Princeton NJ: Educational Testing Service. Note = “{PDF file, 193 KB} %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %B Applied Psychological Measurement %V 26 %P 376-392 %@ 0146-6216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Applied Psychological Measurement %V 26 %P 376-392 %G eng %0 Generic %D 2002 %T MIRTCAT [computer software] %A Li, Y. H. %C Upper Marlboro MD: Author %G eng %0 Conference Paper %B (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Statistical indexes for monitoring item behavior under computer adaptive testing environment %A Zhu, R. %A Yu, F. %A Liu, S. M. %B (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 2002 %T A strategy for controlling item exposure in multidimensional computerized adaptive testing %A Lee, Y. H. %A Ip, E.H. %A Fuh, C.D. %C Available from http://www3. tat.sinica.edu.tw/library/c_tec_rep/c-2002-11.pdf %G eng %0 Book Section %D 2002 %T Test models for complex computer-based testing %A Luecht, RM %A Clauser, B. E. %C C. N. Mille,. M. T. Potenza, J. J. Fremer, and W. C. Ward (Eds.). Computer-based testing: Building the foundation for future assessments (pp. 67-88). Hillsdale NJ: Erlbaum. %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education. %D 2002 %T A testlet assembly design for the uniform CPA examination %A Luecht, RM %A Brumfield, T. %A Breithaupt, K %B Paper presented at the Annual Meeting of the National Council on Measurement in Education. %C New Orleans %G eng %0 Journal Article %J Applied Psychological Measurement %D 2001 %T Computerized adaptive testing with the generalized graded unfolding model %A Roberts, J. S. %A Lin, Y. %A Laughlin, J. E. %K Attitude Measurement %K College Students computerized adaptive testing %K Computer Assisted Testing %K Item Response %K Models %K Statistical Estimation %K Theory %X Examined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 25 %P 177-196 %G eng %0 Journal Article %J American Journal of Preventative Medicine %D 2001 %T Development of an adaptive multimedia program to collect patient health data %A Sutherland, L. A. %A Campbell, M. %A Ornstein, K. %A Wildemuth, B. %A Lobach, D. %B American Journal of Preventative Medicine %V 21 %P 320-324 %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2001 %T An examination of item selection rules by stratified CAT designs integrated with content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Impact of item location effects on ability estimation in CAT %A Liu, M. %A Zhu, R. %A Guo, F. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Integrating stratification and information approaches for multiple constrained CAT %A Leung, C.-I. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Journal of Applied Measurement %D 2001 %T Polytomous modeling of cognitive errors in computer adaptive testing %A Wang, L.-S. %A Chun-Shan Li. %X Used Monte Carlo simulation to compare the relative measurement efficiency of polytomous modeling and dichotomous modeling under different scoring schemes and termination criteria. Results suggest that polytomous computerized adaptive testing (CAT) yields marginal gains over dichotomous CAT when termination criteria are more stringent. Discusses conditions under which polytomous CAT cannot prevent the nonuniform gain in test information. (SLD) %B Journal of Applied Measurement %V 2 (4). %G eng %0 Book Section %D 2000 %T Computer-adaptive sequential testing %A Luecht, RM %A Nungester, R. J. %C W. J. van der Linden (Ed.), Computerized Adaptive Testing: Theory and Practice (pp. 289-209). Dordrecht, The Netherlands: Kluwer. %G eng %0 Book Section %B Development of Computerised Middle School Achievement Tests %D 2000 %T Computer-adaptive testing: A methodology whose time has come %A Linacre, J. M. %E Kang, U. %E Jean, E. %E Linacre, J. M. %K computerized adaptive testing %B Development of Computerised Middle School Achievement Tests %I MESA %C Chicago, IL. USA %V 69 %G eng %0 Generic %D 2000 %T Computer-adaptive testing: A methodology whose time has come. MESA Memorandum No 9 %A Linacre, J. M. %C Chicago : MESA psychometric laboratory, Unversity of Chicago. %G eng %0 Journal Article %J AANA.J %D 2000 %T Computerized adaptive administration of the self-evaluation examination %A LaVelle, T. %A Zaglaniczny, K., %A Spitzer, L.E. %B AANA.J %V 68 %P 226-31 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T Content balancing in stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Generic %D 2000 %T Effects of item-selection criteria on classification testing with the sequential probability ratio test (Research Report 2000-8) %A Lin, C.-J. %A Spray, J. A. %C Iowa City, IA: American College Testing %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report 2000-4) %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA: ACT Inc %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) %A Chang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA, ACT, Inc %G eng %0 Report %D 2000 %T Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. A. %I ACT, Inc. %C Iowa City, Iowa %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T Estimation of trait level in computerized adaptive testing %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Statistical Estimation computerized adaptive testing %X Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 24 %P 257-265 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2000 %T The impact of receiving the same items on consecutive computer adaptive test administrations %A O'Neill, T. %A Lunz, M. E. %A Thiede, K. %X Addresses item exposure in a Computerized Adaptive Test (CAT) when the item selection algorithm is permitted to present examinees with questions that they have already been asked in a previous test administration. The data were from a national certification exam in medical technology. The responses of 178 repeat examinees were compared. The results indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate person ability provides substantial protection from score contamination. The implications for constraints that prohibit examinees from seeing an item twice are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Journal of Applied Measurement %V 1 %P 131-151 %G eng %0 Conference Paper %B Symposium paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2000 %T Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests %A Luecht, RM %B Symposium paper presented at the Annual Meeting of the National Council on Measurement in Education %C New Orleans, LA %G eng %0 Book Section %D 2000 %T Methods of controlling the exposure of items in CAT %A Stocking, M. L. %A Lewis, C. %C W. J. van der Linden and C. A. W. Glas (eds.), Computerized adaptive testing: Theory and practice (pp. 163-182). Norwell MA: Kluwer. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T A new item selection procedure for mixed item type in computerized classification testing %A Lau, C. %A Wang, T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Solving complex constraints in a-stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, USA %G eng %0 Book Section %B Innovations in computerized assessment %D 1999 %T CAT for certification and licensure %A Bergstrom, Betty A. %A Lunz, M. E. %K computerized adaptive testing %X (from the chapter) This chapter discusses implementing computerized adaptive testing (CAT) for high-stakes examinations that determine whether or not a particular candidate will be certified or licensed. The experience of several boards who have chosen to administer their licensure or certification examinations using the principles of CAT illustrates the process of moving into this mode of administration. Examples of the variety of options that can be utilized within a CAT administration are presented, the decisions that boards must make to implement CAT are discussed, and a timetable for completing the tasks that need to be accomplished is provided. In addition to the theoretical aspects of CAT, practical issues and problems are reviewed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Innovations in computerized assessment %I Lawrence Erlbaum Associates %C Mahwah, N.J. %P 67-91 %G eng %0 Conference Proceedings %B annual meeting of the American Educational Research Association %D 1999 %T Computerized classification testing under practical constraints with a polytomous model %A Lau, CA %A Wang, T. %B annual meeting of the American Educational Research Association %C Montreal, Quebec, Canada %8 04/1999 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T Computerized classification testing under practical constraints with a polytomous model %A Lau, C. A, %A Wang, T. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Detecting item memorization in the CAT environment %A McLeod L. D., %A Lewis, C. %B Applied Psychological Measurement %V 23 %P 147-160 %G eng %0 Journal Article %J Studies in language testing %D 1999 %T The development of an adaptive test for placement in french %A Laurier, M. %B Studies in language testing %V 10 %P 122-135 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T An enhanced stratified computerized adaptive testing design %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the *?*. %D 1999 %T Formula score and direct optimization algorithms in CAT ASVAB on-line calibration %A Levine, M. V. %A Krass, I. A. %B Paper presented at the annual meeting of the *?*. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Impact of flawed items on ability estimation in CAT %A Liu, M. %A Steffen, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Report %D 1999 %T Incorporating content constraints into a multi-stage adaptive testlet design. %A Reese, L. M. %A Schnipke, D. L. %A Luebke, S. W. %X Most large-scale testing programs facing computerized adaptive testing (CAT) must face the challenge of maintaining extensive content requirements, but content constraints in computerized adaptive testing (CAT) can compromise the precision and efficiency that could be achieved by a pure maximum information adaptive testing algorithm. This simulation study first evaluated whether realistic content constraints could be met by carefully assembling testlets and appropriately selecting testlets for each test taker that, when combined, would meet the content requirements of the test and would be adapted to the test takers ability level. The second focus of the study was to compare the precision of the content-balanced testlet design with that achieved by the current paper-and-pencil version of the test through data simulation. The results reveal that constraints to control for item exposure, testlet overlap, and efficient pool utilization need to be incorporated into the testlet assembly algorithm. More refinement of the statistical constraints for testlet assembly is also necessary. However, even for this preliminary attempt at assembling content-balanced testlets, the two-stage computerized test simulated with these testlets performed quite well. (Contains 5 figures, 5 tables, and 12 references.) (Author/SLD) %B LSAC Computerized Testing Report %I Law School Admission Council %C Princeton, NJ. USA %@ Series %G eng %M ED467816 %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 1999 %T Item selection in computerized adaptive testing: improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Montreal, CA %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Reducing bias in CAT trait estimation: A comparison of approaches %A Wang, T. %A Hanson, B. H. %A C.-M. H. Lau %B Applied Psychological Measurement %V 23 %P 263-278 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Study of methods to detect aberrant response patterns in computerized testing %A Iwamoto, C. K. %A Nungester, R. J. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Book Section %D 1999 %T Testing adaptatif et évaluation des processus cognitifs %A Laurier, M. %C C. Depover and B. Noël (Éds) : L’évaluation des compétences et des processus cognitifs - Modèles, pratiques et contextes. Bruxelles : De Boeck Université. %G eng %0 Journal Article %J American Journal of Occupational Therapy %D 1999 %T The use of Rasch analysis to produce scale-free measurement of functional ability %A Velozo, C. A. %A Kielhofner, G. %A Lai, J-S. %K *Activities of Daily Living %K Disabled Persons/*classification %K Human %K Occupational Therapy/*methods %K Predictive Value of Tests %K Questionnaires/standards %K Sensitivity and Specificity %X Innovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment. %B American Journal of Occupational Therapy %V 53 %P 83-90 %G eng %M 9926224 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Adaptive testing without IRT %A Yan, D. %A Lewis, C. %A Stocking, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T A Bayesian approach to detection of item preknowledge in a CAT %A McLeod, L. D. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Generic %D 1998 %T CASTISEL [Computer software] %A Luecht, RM %C Philadelphia, PA: National Board of Medical Examiners %G eng %0 Generic %D 1998 %T Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test (GRE Board Professional Report No 95-08P; Educational Testing Service Research Report 98-38) %A Schaeffer, G. %A Bridgeman, B. %A Golub-Smith, M. L. %A Lewis, C. %A Potenza, M. T. %A Steffen, M. %C Princeton, NJ: Educational Testing Service %G eng %0 Report %D 1998 %T Comparability of paper-and-pencil and computer adaptive test scores on the GRE General Test %A Schaeffer, G. A. %A Bridgeman, B. %A Golub-Smith, M. L. %A Lewis, C. %A Potenza, M. T. %A Steffen, M. %I Educational Testing Services %C Princeton, N.J. %8 August, 1998 %@ ETS Research Report 98-38 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1998 %T Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing %A Lau, CA %A Wang, T. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego %G eng %0 Journal Article %J Applied Psychological Measurement %D 1998 %T Computer-assisted test assembly using optimization heuristics %A Luecht, RM %B Applied Psychological Measurement %V 22 %P 224-236. %G eng %0 Journal Article %J Journal of Educational & Behavioral Statistics %D 1998 %T Controlling item exposure conditional on ability in computerized adaptive testing %A Stocking, M. L. %A Lewis, C. %X The interest in the application of large-scale adaptive testing for secure tests has served to focus attention on issues that arise when theoretical advances are made operational. One such issue is that of ensuring item and pool security in the continuous testing environment made possible by the computerized admin-istration of a test, as opposed to the more periodic testing environment typically used for linear paper-and-pencil tests. This article presents a new method of controlling the exposure rate of items conditional on ability level in this continuous testing environment. The properties of such conditional control on the exposure rates of items, when used in conjunction with a particular adaptive testing algorithm, are explored through studies with simulated data. %B Journal of Educational & Behavioral Statistics %V 23 %P 57-75 %G eng %0 Generic %D 1998 %T Development and evaluation of online calibration procedures (TCN 96-216) %A Levine, M. L. %A Williams. %C Champaign IL: Algorithm Design and Measurement Services, Inc %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1998 %T Essentially unbiased Bayesian estimates in computerized adaptive testing %A Wang, T. %A Lau, C. %A Hanson, B. A. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Evaluation of methods for the use of underutilized items in a CAT environment %A Steffen, M. %A Liu, M. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of National Council on Measurement in Education %D 1998 %T Expected losses for individuals in Computerized Mastery Testing %A Smith, R. %A Lewis, C. %B Paper presented at the annual meeting of National Council on Measurement in Education %C San Diego %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council for Measurement in Education %D 1998 %T A framework for exploring and controlling risks associated with test item exposure over time %A Luecht, RM %B Paper presented at the Annual Meeting of the National Council for Measurement in Education %C San Diego, CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T The impact of scoring flawed items on ability estimation in CAT %A Liu, M. %A Steffen, M. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Journal Article %J Advances in Health Sciences Education %D 1998 %T Maintaining content validity in computerized adaptive testing %A Luecht, RM %A de Champlain, A. %A Nungester, R. J. %K computerized adaptive testing %X The authors empirically demonstrate some of the trade-offs which can occur when content balancing is imposed in computerized adaptive testing (CAT) forms or conversely, when it is ignored. The authors contend that the content validity of a CAT form can actually change across a score scale when content balancing is ignored. However they caution that, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from 2 simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Advances in Health Sciences Education %V 3 %P 29-41 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T A new approach for the detection of item preknowledge in computerized adaptive testing %A McLeod, L. D. %A Lewis, C. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Patterns of item exposure using a randomized CAT algorithm %A Lunz, M. E. %A Stahl, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego, CA %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Some practical examples of computerized adaptive sequential testing %A Luecht, RM %A Nungester, R. J. %B Journal of Educational Measurement %V 35 %P 229-249 %G eng %0 Journal Article %J Intelligence %D 1998 %T Testing word knowledge by telephone to estimate general cognitive aptitude using an adaptive test %A Legree, P. J. %A Fischl, M. A %A Gade, P. A. %A Wilson, M. %B Intelligence %V 26 %P 91-98 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Computer assembly of tests so that content reigns supreme %A Case, S. M. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 1997 %T Computerized adaptive testing through the World Wide Web %A Shermis, M. D. %A Mzumara, H. %A Brown, M. %A Lillig, C. %C (ERIC No. ED414536) %G eng %0 Generic %D 1997 %T Incorporating content constraints into a multi-stage adaptive testlet design: LSAC report %A Reese, L. M. %A Schnipke, D. L. %A Luebke, S. W. %C Newtown, PA: Law School Admission Council %G eng %0 Conference Paper %B Paper presented at the annual meeting of National Council on Measurement in Education %D 1997 %T Incorporating decision consistency into Bayesian sequential testing %A Smith, R. %A Lewis, C. %B Paper presented at the annual meeting of National Council on Measurement in Education %C Chicago %G eng %0 Generic %D 1997 %T Linking scores for computer-adaptive and paper-and-pencil administrations of the SAT (Research Report No 97-12) %A Lawrence, I. %A Feigenbaum, M. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Maintaining a CAT item pool with operational data %A Levine, M. L. %A Segall, D. O. %A Williams, B. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Overview of the USMLE Step 2 computerized field test %A Luecht, RM %A Nungester, R. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Psychometric mode effects and fit issues with respect to item difficulty estimates %A Hadidi, A. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration %A Swanson, D. B. %A Featherman, C. M. %A Case, A. M. %A Luecht, RM %A Nungester, R. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 1997 %T Unidimensional approximations for a computerized adaptive test when the item pool and latent space are multidimensional (Research Report 97-5) %A Spray, J. A. %A Abdel-Fattah, A. A. %A Huang, C.-Y. %A Lau, CA %C Iowa City IA: ACT Inc %G eng %0 Book Section %D 1996 %T Adaptive assessment and training using the neighbourhood of knowledge states %A Dowling, C. E. %A Hockemeyer, C. %A Ludwig, A .H. %C Frasson, C. and Gauthier, G. and Lesgold, A. (eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montral, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 578-587. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Effect of altering passing score in CAT when unidimensionality is violated %A Abdel-Fattah, A. A. %A Lau, CA %A Spray, J. A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %8 April %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristic-based CAT: Balancing item information, content and exposure %A Luecht, RM %A Hadadi, A. %A Nungester, R. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristic-based CAT: Balancing item information, content, and exposure %A Luecht, RM %A Hadadi, A. %A Nungester, R. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristics based CAT: Balancing item information, content, and exposure %A Luecht, RM %A Nungester, R. J. %A Hadadi, A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Modifying the NCLEXTM CAT item selection algorithm to improve item exposure %A Way, W. D. %A A Zara %A Leahy, J. %B Paper presented at the annual meeting of the American Educational Research Association %C New York %G eng %0 Journal Article %J Applied Psychological Measurement %D 1996 %T Multidimensional computerized adaptive testing in a certification or licensure context %A Luecht, RM %K computerized adaptive testing %X (from the journal abstract) Multidimensional item response theory (MIRT) computerized adaptive testing, building on a recent work by D. O. Segall (1996), is applied in a licensing/certification context. An example of a medical licensure test is used to demonstrate situations in which complex, integrated content must be balanced at the total test level for validity reasons, but items assigned to reportable subscore categories may be used under a MIRT adaptive paradigm to improve the reliability of the subscores. A heuristic optimization framework is outlined that generalizes to both univariate and multivariate statistical objective functions, with additional systems of constraints included to manage the content balancing or other test specifications on adaptively constructed test forms. Simulation results suggested that a multivariate treatment of the problem, although complicating somewhat the objective function used and the estimation of traits, nonetheless produces advantages from a psychometric perspective. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Applied Psychological Measurement %V 20 %P 389-404 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1996 %T Multidimensional Computerized Adaptive Testing in a Certification or Licensure Context %A Luecht, RM %B Applied Psychological Measurement %V 20 %P 389-404 %G English %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 1996 %T Person-fit indices and their role in the CAT environment %A David, L. A. %A Lewis, C. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Person-fit indices and their role in the CAT environment %A McLeod, L. D. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York %G eng %0 Book %D 1996 %T Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data %A Lau, CA %C Unpublished doctoral dissertation, University of Iowa, Iowa City IA %0 Conference Paper %B Paper presented at the annual meeting of National Council on Measurement in Education %D 1996 %T A search procedure to determine sets of decision points when using testlet-based Bayesian sequential testing procedures %A Smith, R. %A Lewis, C. %B Paper presented at the annual meeting of National Council on Measurement in Education %C New York %G eng %0 Generic %D 1996 %T Some practical examples of computerized adaptive sequential testing (Internal Report) %A Luecht, RM %A Nungester, R. J. %C Philadelphia: National Board of Medical Examiners %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Strategies for managing item pools to maximize item security %A Way, W. D. %A A Zara %A Leahy, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Conference Paper %B Poster session presented at the annual meeting of the American Educational Research Association %D 1996 %T Using unidimensional IRT models for dichotomous classification via CAT with multidimensional data %A Lau, CA %A Abdel-Fattah, A. A. %A Spray, J. A. %B Poster session presented at the annual meeting of the American Educational Research Association %C Boston MA %G eng %0 Journal Article %J Teaching and Learning in Medicine %D 1996 %T Validity of item selection: A comparison of automated computerized adaptive and manual paper and pencil examinations %A Lunz, M. E. %A Deville, C. W. %B Teaching and Learning in Medicine %V 8 %P 152-157 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T A Bayesian computerized mastery model with multiple cut scores %A Smith, R. L. %A Lewis, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1995 %T A comparison of classification agreement between adaptive and full-length test under the 1-PL and 2-PL models %A Lewis, M. J. %A Subhiyah, R. G. %A Morrison, C. A. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1995 %T A comparison of two IRT-based models for computerized mastery testing when item parameter estimates are uncertain %A Way, W. D. %A Lewis, C. %A Smith, R. L. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Journal Article %J Journal of the American Dietetic Association %D 1995 %T Computer-adaptive testing: A new breed of assessment %A Ruiz, B. %A Fitz, P. A. %A Lewis, C. %A Reidy, C. %B Journal of the American Dietetic Association %V 95 %P 1326-1327 %G eng %0 Journal Article %J Journal of the American Dietetic Association %D 1995 %T Computer-adaptive testing: A new breed of assessment %A Ruiz, B. %A Fitz, P. A. %A Lewis, C. %A Reidy, C. %B Journal of the American Dietetic Association %V 95 %P 1326-1327 %G eng %0 Journal Article %J Rasch Measurement Transactions %D 1995 %T Computer-adaptive testing: CAT: A Bayesian maximum-falsification approach %A Linacre, J. M. %B Rasch Measurement Transactions %V 9 %P 412 %G eng %0 Journal Article %J Journal of Educational Computing Research %D 1995 %T Computerized adaptive testing: Tracking candidate response patterns %A Lunz, M. E. %A Bergstrom, Betty A. %X Tracked the effect of candidate response patterns on a computerized adaptive test. Data were from a certification examination in laboratory science administered in 1992 to 155 candidates, using a computerized adaptive algorithm. The 90-item certification examination was divided into 9 units of 10 items each to track the pattern of initial responses and response alterations on ability estimates and test precision across the 9 test units. The precision of the test was affected most by response alterations during early segments of the test. While candidates generally benefited from altering responses, individual candidates showed different patterns of response alterations across test segments. Test precision was minimally affected, suggesting that the tailoring of computerized adaptive testing is minimally affected by response alterations. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Journal of Educational Computing Research %V 13 %P 151-162 %G eng %0 Generic %D 1995 %T Controlling item exposure conditional on ability in computerized adaptive testing (Research Report 95-25) %A Stocking, M. L. %A Lewis, C. %C Princeton NJ: Educational Testing Service. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T The effect of model misspecification on classification decisions made using a computerized test: UIRT versus MIRT %A Abdel-Fattah, A. A. %A Lau, C.-M. A. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T Equating computerized adaptive certification examinations: The Board of Registry series of studies %A Lunz, M. E. %A Bergstrom, Betty A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Book Section %D 1995 %T Individualized testing in the classroom %A Linacre, J. M. %C Anderson, L.W. (Ed.), International Encyclopedia of Teaching and Teacher Education. Oxford, New York, Tokyo: Elsevier Science 295-299. %G eng %0 Generic %D 1995 %T A new method of controlling item exposure in computerized adaptive testing (Research Report 95-25) %A Stocking, M. L. %A Lewis, C. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1995 %T Some alternative CAT item selection heuristics (Internal report) %A Luecht, RM %C Philadelphia PA: National Board of Medical Examiners %G eng %0 Journal Article %J International journal of Educational Research %D 1994 %T Computer adaptive testing %A Lunz, M. E. %A Bergstrom, Betty A. %A Gershon, R. C. %B International journal of Educational Research %V 6 %P 623-634 %G eng %0 Generic %D 1994 %T Computerized mastery testing using fuzzy set decision theory (Research Report 94-37) %A Du, Y. %A Lewis, C. %A Pashley, P. J. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characteristics of computerized adaptive tests %A Stone, G. E. %A Lunz, M. E. %B Applied Measurement in Education %V 7 %P 211-222 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characteristics of computerized adaptive tests %A Lunz, M. E. %A Stone, G. E. %B Applied Measurement in Education %V 7(3) %P 211-222 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characterstics of computerized adaptive tests %A Stone, G. E. %A Lunz, M. E. %X Explored the effect of reviewing items and altering responses on examinee ability estimates, test precision, test information, decision confidence, and pass/fail status for computerized adaptive tests. Two different populations of examinees took different computerized certification examinations. For purposes of analysis, each population was divided into 3 ability groups (high, medium, and low). Ability measures before and after review were highly correlated, but slightly lower decision confidence was found after review. Pass/fail status was most affected for examinees with estimates close to the pass point. Decisions remained the same for 94% of the examinees. Test precision is only slightly affected by review, and the average information loss can be recovered by the addition of one item. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Applied Measurement in Education %V 7 %P 211-222 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1994 %T An empirical study of computerized adaptive test administration conditions %A Lunz, M. E. %A Bergstrom, Betty A. %B Journal of Educational Measurement %V 31 %P 251-263 %8 Fal %G eng %0 Book Section %B Objective measurement: Theory into practice %D 1994 %T The equivalence of Rasch item calibrations and ability estimates across modes of administration %A Bergstrom, Betty A. %A Lunz, M. E. %K computerized adaptive testing %B Objective measurement: Theory into practice %I Ablex Publishing Co. %C Norwood, N.J. USA %V 2 %P 122-128 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1994 %T A few more issues to consider in multidimensional computerized adaptive testing %A Luecht, RM %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1994 %T Item calibration considerations: A comparison of item calibrations on written and computerized adaptive examinations %A Stone, G. E. %A Lunz, M. E. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T Pinpointing PRAXIS I CAT characteristics through simulation procedures %A Eignor, D. R. %A Folk, V.G., %A Li, M.-Y. %A Stocking, M. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, LA %0 Book Section %B Objective measurement, theory into practice %D 1994 %T Reliability of alternate computer adaptive tests %A Lunz, M. E. %A Bergstrom, Betty A. %A Wright, B. D. %B Objective measurement, theory into practice %I Ablex %C New Jersey %V II %G eng %0 Journal Article %J Nurs Health Care %D 1993 %T Computerized adaptive testing: the future is upon us %A Halkitis, P. N. %A Leahy, J. M. %K *Computer-Assisted Instruction %K *Education, Nursing %K *Educational Measurement %K *Reaction Time %K Humans %K Pharmacology/education %K Psychometrics %B Nurs Health Care %7 1993/09/01 %V 14 %P 378-85 %8 Sep %@ 0276-5284 (Print) %G eng %M 8247367 %0 Journal Article %J Applied Measurement in Education %D 1993 %T Computerized mastery testing using fuzzy set decision theory %A Du, Y. %A Lewis, C. %A Pashley, P. J. %B Applied Measurement in Education %V 6 %P 181-193 %G eng %0 Generic %D 1993 %T Les tests adaptatifs en langue seconde %A Laurier, M. %C Communication lors de la 16e session d’étude de l’ADMÉÉ à Laval. Montréal: Association pour le développement de la mesure et de l’évaluation en éducation. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T Test targeting and precision before and after review on computer-adaptive tests %A Lunz, M. E. %A Stahl, J. A. %A Bergstrom, Betty A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Book Section %D 1993 %T Un test adaptatif en langue seconde : la perception des apprenants %A Laurier, M. %C R.Hivon (Éd.),L’évaluation des apprentissages. Sherbrooke : Éditions du CRP. %G eng %0 Journal Article %J Applied Measurement in Education %D 1992 %T Altering the level of difficulty in computer adaptive testing %A Bergstrom, Betty A. %A Lunz, M. E. %A Gershon, R. C. %K computerized adaptive testing %X Examines the effect of altering test difficulty on examinee ability measures and test length in a computer adaptive test. The 225 Ss were randomly assigned to 3 test difficulty conditions and given a variable length computer adaptive test. Examinees in the hard, medium, and easy test condition took a test targeted at the 50%, 60%, or 70% probability of correct response. The results show that altering the probability of a correct response does not affect estimation of examinee ability and that taking an easier computer adaptive test only slightly increases the number of items necessary to reach specified levels of precision. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Applied Measurement in Education %V 5 %P 137-149 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1992 %T A comparison of the performance of simulated hierarchical and linear testlets %A Wainer, H., %A Kaplan, B. %A Lewis, C. %B Journal of Educational Measurement %V 29 %P 243-251 %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1992 %T Computerized adaptive testing with different groups %A Legg, S. M., %A Buhr, D. C. %B Educational Measurement: Issues and Practice %V 11 (2) %P 23-27 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T Computerized Mastery Testing With Nonequivalent Testlets %A Sheehan, K. %A Lewis, C. %B Applied Psychological Measurement %V 16 %P 65-76 %G English %N 1 %0 Journal Article %J Applied Psychological Measurement %D 1992 %T Computerized mastery testing with nonequivalent testlets %A Sheehan, K., %A Lewis, C. %B Applied Psychological Measurement %V 16 %P 65-76 %G eng %0 Journal Article %J Evaluation and the Health Professions %D 1992 %T Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations %A Bergstrom, Betty A. %A Lunz, M. E. %X Compared the level of confidence in pass/fail decisions obtained with computer adaptive tests (CADTs) and pencil-and-paper tests (PPTs). 600 medical technology students took a variable-length CADT and 2 fixed-length PPTs. The CADT was stopped when the examinee ability estimate was either 1.3 times the standard error of measurement above or below the pass/fail point or when a maximum test length was reached. Results show that greater confidence in the accuracy of the pass/fail decisions was obtained for more examinees when the CADT implemented a 90% confidence stopping rule than with PPTs of comparable test length. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Evaluation and the Health Professions %V 15 %P 453-464 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests %A Lunz, M. E. %A Berstrom, B.A. %A Wright, B. D. %B Applied Psychological Measurement %V 16 %P 33-40 %G English %N 1 %0 Journal Article %J Applied Psychological Measurement %D 1992 %T The effect of review on student ability and test efficiency for computerized adaptive tests %A Lunz, M. E. %A Bergstrom, Betty A. %A Wright, Benjamin D. %X 220 students were randomly assigned to a review condition for a medical technology test; their test instructions indicated that each item must be answered when presented, but that the responses could be reviewed and altered at the end of the test. A sample of 492 students did not have the opportunity to review and alter responses. Within the review condition, examinee ability estimates before and after review were correlated .98. The average efficiency of the test was decreased by 1% after review. Approximately 32% of the examinees improved their ability estimates after review but did not change their pass/fail status. Disallowing review on adaptive tests administered under these rules is not supported by these data. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Applied Psychological Measurement %V 16 %P 33-40 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T Item selection using an average growth approximation of target information functions %A Luecht, RM %A Hirsch, T. M. %B Applied Psychological Measurement %V 16 %P 41-51 %G eng %0 Generic %D 1992 %T The Language Training Division's computer adaptive reading proficiency test %A Janczewski, D. %A Lowe, P. %C Provo, UT: Language Training Division, Office of Training and Education %G eng %0 Conference Paper %D 1992 %T Multidimensional CAT simulation study %A Luecht, RM %G eng %0 Journal Article %J Journal of Educational Measurement %D 1991 %T Building algebra testlets: A comparison of hierarchical and linear structures %A Wainer, H., %A Lewis, C. %A Kaplan, B. %A Braswell, J. %B Journal of Educational Measurement %V 8 %P xxx-xxx %G eng %0 Journal Article %J Journal of Allied Health %D 1991 %T Comparability of decisions for computer adaptive and written examinations %A Lunz, M. E. %A Bergstrom, Betty A. %B Journal of Allied Health %V 20 %P 15-23 %G eng %0 Generic %D 1991 %T Comparisons of computer adaptive and pencil and paper tests %A Bergstrom, Betty A. %A Lunz, M. E. %C Chicago IL: American Society of Clinical Pathologists %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1991 %T Confidence in pass/fail decisions for computer adaptive and paper and pencil examinations %A Bergstrom, B. B %A Lunz, M. E. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Generic %D 1991 %T Construction and validation of the SON-R 5-17, the Snijders-Oomen non-verbal intelligence test %A Laros, J. A. %A Tellegen, P. J. %C Groningen: Wolters-Noordhoff %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1991 %T Development and evaluation of hierarchical testlets in two-stage tests using integer linear programming %A Lam, T. L. %A Goong, Y. Y. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Generic %D 1991 %T Some empirical guidelines for building testlets (Technical Report 91-56) %A Wainer, H., %A Kaplan, B. %A Lewis, C. %C Princeton NJ: Educational Testing Service, Program Statistics Research %G eng %0 Conference Paper %B annual meeting of the American Education Research Association %D 1991 %T The use of the graded response model in computerized adaptive testing of the attitudes to science scale %A Foong, Y-Y. %A Lam, T-L. %X The graded response model for two-stage testing was applied to an attitudes toward science scale using real-data simulation. The 48-item scale was administered to 920 students at a grade-8 equivalent in Singapore. A two-stage 16-item computerized adaptive test was developed. In two-stage testing an initial, or routing, test is followed by a second-stage testlet of greater or lesser difficulty based on performance. A conventional test of the same length as the adaptive two-stage test was selected from the 48-item pool. Responses to the conventional test, the routing test, and a testlet were simulated. The algorithm of E. Balas (1965) and the multidimensional knapsack problem of optimization theory were used in test development. The simulation showed the efficiency and accuracy of the two-stage test with the graded response model in estimating attitude trait levels, as evidenced by better results from the two-stage test than its conventional counterpart and the reduction to one-third of the length of the original measure. Six tables and three graphs are included. (SLD) %B annual meeting of the American Education Research Association %C Chicago, IL USA %8 April 3-7, 1991 %G eng %M ED334272 %0 Generic %D 1990 %T An adaptive algebra test: A testlet-based, hierarchically structured test with validity-based scoring %A Wainer, H., %A Lewis, C. %A Kaplan, B, %A Braswell, J. %C ETS Technical Report 90-92 %G eng %0 Generic %D 1990 %T Generative adaptive testing with digit span items %A Wolfe, J. H. %A Larson, G. E. %C San Diego, CA: Testing Systems Department, Navy Personnel Research and Development Center %G eng %0 Conference Paper %B Paper presented at the Midwest Objective Measurement Seminar %D 1990 %T The stability of Rasch pencil and paper item calibrations on computer adaptive tests %A Bergstrom, Betty A. %A Lunz, M. E. %B Paper presented at the Midwest Objective Measurement Seminar %C Chicago IL %G eng %0 Conference Proceedings %B annual meeting of the National Council on Measurement in Education %D 1990 %T Test-retest consistency of computer adaptive tests. %A Lunz, M. E. %A Bergstrom, Betty A. %A Gershon, R. C. %B annual meeting of the National Council on Measurement in Education %C Boston, MA USA %8 04/1990 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1990 %T Toward a psychometrics for testlets %A Wainer, H., %A Lewis, C. %B Journal of Educational Measurement %V 27 %P 1-14 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1990 %T Using Bayesian decision theory to design a computerized mastery test %A Lewis, C., %A Sheehan, K. %B Applied Psychological Measurement %V 14 %P 367-386 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1990 %T Using Bayesian Decision Theory to Design a Computerized Mastery Test %A Lewis, C. %A Sheehan, K. %B Applied Psychological Measurement %V 14 %P 367-386 %G English %N 4 %0 Conference Paper %B Paper presented at the annual meeting of the Regional Language Center Seminar %D 1990 %T What can we do with computerized adaptive testing and what we cannot do? %A Laurier, M. %B Paper presented at the annual meeting of the Regional Language Center Seminar %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1989 %T EXSPRT: An expert systems approach to computer-based adaptive testing %A Frick, T. W. %A Plew, G.T. %A Luk, H.-K. %B Paper presented at the annual meeting of the American Educational Research Association %V San Francisco. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1989 %T Investigating the validity of a computerized adaptive test for different examinee groups %A Buhr, D. C. %A Legg, S. M. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Generic %D 1988 %T Computerized adaptive testing: The state of the art in assessment at three community colleges %A League-for-Innovation-in-the-Community-College %C Laguna Hills CA: Author %G eng %0 Journal Article %J Machine-Mediated Learning %D 1988 %T Computerized mastery testing %A Lewis, C. %A Sheehan, K. %B Machine-Mediated Learning %V 2 %P 283-286 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T Simple and effective algorithms [for] computer-adaptive testing %A Linacre, J. M. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 1987 %T A computer program for adaptive testing by microcomputer (MESA Memorandum No 40) %A Linacre, J. M. %C Chicago: University of Chicago. (ERIC ED 280 895.) %G eng %0 Generic %D 1987 %T Computerized adaptive language testing: A Spanish placement exam %A Larson, J. W. %C In Language Testing Research Selected Papers from the Colloquium, Monterey CA %G eng %0 Generic %D 1987 %T Final report: Feasibility study of a computerized test administration of the CLAST %A Legg, S. M. %A Buhr, D. C. %C University of Florida: Institute for Student Assessment and Evaluation %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1986 %T The effects of computer experience on computerized adaptive test performance %A Lee, J. A. %B Educational and Psychological Measurement %V 46 %P 727-733 %G eng %0 Journal Article %J Review of Educational Research %D 1985 %T Implications for altering the context in which test items appear: A historical perspective on an immediate concern %A Leary, L. F. %A Dorans, N. J. %B Review of Educational Research %V 55 %P 387-413 %G eng %0 Journal Article %J Annual Review of Psychology %D 1985 %T Latent structure and item sampling models for testing %A Traub, R. E. %A Lam, Y. R. %B Annual Review of Psychology %V 36 %P 19-48 %0 Generic %D 1984 %T Efficiency and precision in two-stage adaptive testing %A Loyd, B.H. %C West Palm Beach Florida: Eastern ERA %G eng %0 Generic %D 1984 %T Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T A plan for scaling the computerized adaptive Armed Services Vocational Aptitude Battery %A Green, B. F. %A Bock, B. D., %A Linn, R. L. %A Lord, F. M., %A Reckase, M. D. %B Journal of Educational Measurement %V 21 %P 347-360 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T Technical guidelines for assessing computerized adaptive tests %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %K computerized adaptive testing %K Mode effects %K paper-and-pencil %B Journal of Educational Measurement %V 21 %P 347-360 %@ 1745-3984 %G eng %0 Book Section %B New horizons in testing: Latent trait test theory and computerized adaptive testing %D 1983 %T Small N justifies Rasch model %A Lord, F. M., %E Bock, R. D. %B New horizons in testing: Latent trait test theory and computerized adaptive testing %I Academic Press %C New York, NY. USA %P 51-61 %G eng %0 Conference Paper %B Paper presented at the 23rd conference of the Military Testing Association. %D 1982 %T Legal and political considerations in large-scale adaptive testing %A B. K. Waters %A Lee, G. C. %B Paper presented at the 23rd conference of the Military Testing Association. %G eng %0 Journal Article %J Journal of Communication Disorders %D 1980 %T Computer applications in audiology and rehabilitation of the hearing impaired %A Levitt, H. %B Journal of Communication Disorders %V 13 %P 471-481 %G eng %0 Book Section %D 1980 %T Some how and which for practical tailored testing %A Lord, F. M., %C L. J. T. van der Kamp, W. F. Langerak and D.N.M. de Gruijter (Eds): Psychometrics for educational debates (pp. 189-206). New York: John Wiley and Sons. Computer-Assisted Instruction, Testing, and Guidance (pp. 139-183). New York: Harper and Row. %G eng %0 Book Section %D 1978 %T Panel discussion: Future directions for computerized adaptive testing %A Lord, F. M., %C D. J. Weiss (Ed.), Proceedings of the 1977 Item Response Theory and Computerized adaptive conference. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T A broad-range tailored test of verbal ability %A Lord, F. M., %B Applied Psychological Measurement %V 1 %P 95-100 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T A Broad-Range Tailored Test of Verbal Ability %A Lord, F M %B Applied Psychological Measurement %V 1 %P 95-100 %G En %N 1 %0 Conference Paper %B Third international Symposium on Educational testing %D 1977 %T Group tailored tests and some problems of their utlization %A Lewy, A %A Doron, R %B Third international Symposium on Educational testing %C Leyden, The Netherlands %8 06/1977 %G eng %0 Book Section %D 1977 %T A Low-Cost Terminal Usable for Computerized Adaptive Testing %A Lamos, J. P. %A B. K. Waters %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program %0 Book Section %D 1976 %T A broad range tailored test of verbal ability %A Lord, F. M., %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 75-78). Washington DC: U.S. Government Printing Office. %G eng %0 Book Section %D 1976 %T Discussion %A Lord, F. M., %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 113-117). Washington DC: U.S. Government Printing Office. %G eng %0 Book Section %D 1976 %T Some likelihood functions found in tailored testing %A Lord, F. M., %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 79-81). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1976 %T Test theory and the public interest %A Lord, F. M., %C Proceedings of the Educational Testing Service Invitational Conference %G eng %0 Generic %D 1975 %T A broad range test of verbal ability (RB-75-5) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Book Section %D 1975 %T Discussion %A Linn, R. L. %C D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 44-46. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1975 %T An empirical comparison of two-stage and pyramidal ability testing (Research Report 75-1) %A Larkin, K. C. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1974 %T An empirical investigation of computer-administered pyramidal ability testing (Research Report 74-3) %A Larkin, K. C. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Book Section %D 1974 %T Individualized testing and item characteristic curve theory %A Lord, F. M., %C D. H. Krantz, R. C. Atkinson, R. D. Luce, and P. Suppes (Eds.), Contemporary developments in mathematical psychology (Vol. II). San Francisco: Freeman. %G eng %0 Generic %D 1974 %T Practical methods for redesigning a homogeneous test, also for designing a multilevel test (RB-74-30) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1972 %T Individualized testing and item characteristic curve theory (RB-72-50) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1972 %T Sequential testing for dichotomous decisions. %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %K CCAT %K CLASSIFICATION Computerized Adaptive Testing %K sequential probability ratio testing %K SPRT %B Educational and Psychological Measurement %V 32 %P 85-95. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1971 %T Robbins-Monro procedures for tailored testing %A Lord, F. M., %B Educational and Psychological Measurement %V 31 %P 3-31 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1971 %T The self-scoring flexilevel test %A Lord, F. M., %B Journal of Educational Measurement %V 8 %P 147-151 %G eng %0 Generic %D 1971 %T Tailored testing: An application of stochastic approximation (RM 71-2) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of the American Statistical Association %D 1971 %T Tailored testing, an approximation of stochastic approximation %A Lord, F. M., %B Journal of the American Statistical Association %V 66 %P 707-711 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1971 %T A theoretical study of the measurement effectiveness of flexilevel tests %A Lord, F. M., %B Educational and Psychological Measurement %V 31 %P 805-813 %G eng %0 Journal Article %J Psychometrika %D 1971 %T A theoretical study of two-stage testing %A Lord, F. M., %B Psychometrika %V 36 %P 227-242 %G eng %0 Generic %D 1970 %T The self-scoring flexilevel test (RB-7043) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1970 %T Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service. %G eng %0 Book Section %D 1970 %T Some test theory for tailored testing %A Lord, F. M., %C W. H. Holtzman (Ed.), Computer-assisted instruction, testing, and guidance (pp.139-183). New York: Harper and Row. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T The development and evaluation of several programmed testing methods %A Linn, R. L. %A Cleary, T. A. %B Educational and Psychological Measurement %V 29 %P 129-146 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T An exploratory study of programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Educational and Psychological Measurement %V 28 %P 345-360 %G eng %0 Book %D 1968 %T Computer-assisted testing (Eds.) %A Harman, H. H. %A Helm, C. E. %A Loye, D. E. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1968 %T The development and evaluation of several programmed testing methods (Research Bulletin 68-5) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of Educational Measurement %D 1968 %T Reproduction of total test score through the use of sequential programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Journal of Educational Measurement %V 5 %P 183-187 %G eng