02111nas a2200193 4500008003900000245007100039210006900110300000900179490000700188520153100195653000701726653003201733653001701765653002301782100001501805700001501820700001901835856006301854 2020 d00aItem Calibration Methods With Multiple Subscale Multistage Testing0 aItem Calibration Methods With Multiple Subscale Multistage Testi a3-280 v573 aAbstract Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence.10aEM10amarginal maximum likelihood10amissing data10amultistage testing1 aWang, Chun1 aChen, Ping1 aJiang, Shengyu uhttps://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.1224100579nas a2200169 4500008004500000245007100045210006900116300000900185490000600194653003100200653002000231653005100251100001800302700001400320700001500334856006000349 2019 Engldsh 00aHow Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive?0 aHow Adaptive Is an Adaptive Test Are All Adaptive Tests Adaptive a1-140 v710acomputerized adaptive test10amultistage test10astatistical indicators of amount of adaptation1 aReckase, Mark1 aJu, Unhee1 aKim, Sewon uhttp://iacat.org/jcat/index.php/jcat/article/view/69/3402101nas a2200181 4500008004100000245004600041210004600087260005500133520153800188653002501726653000801751100002801759700001901787700001301806700002001819700001301839856006701852 2017 eng d00aBayesian Perspectives on Adaptive Testing0 aBayesian Perspectives on Adaptive Testing aNiigata, JapanbNiigata Seiryo Universityc08/20173 a
Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.
The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.
10aBayesian Perspective10aCAT1 avan der Linden, Wim, J.1 aJiang, Bingnan1 aRen, Hao1 aChoi, Seung, W.1 aDiao, Qi uhttp://mail.iacat.org/bayesian-perspectives-adaptive-testing-002109nas a2200169 4500008004100000245005200041210005100093260005500144520156900199653002101768653000801789653002301797100001501820700001901835700001301854856007201867 2017 eng d00aMHK-MST Design and the Related Simulation Study0 aMHKMST Design and the Related Simulation Study aNiigata, JapanbNiigata Seiryo Universityc08/20173 a
The MHK is a national standardized exam that tests and rates Chinese language proficiency. It assesses non-native Chinese minorities’ abilities in using the Chinese language in their daily, academic and professional lives; Computerized multistage adaptive testing (MST) is a combination of conventional paper-and-pencil (P&P) and item level computerized adaptive test (CAT), it is a kind of test forms based on computerized technology, take the item set as the scoring unit. It can be said that, MST estimate the Ability extreme value more accurate than conventional paper-and-pencil (P&P), also used the CAT auto-adapted characteristic to reduce the examination length and the score time of report. At present, MST has used in some large test, like Uniform CPA Examination and Graduate Record Examination(GRE). Therefore, it is necessary to develop the MST of application in China.
Based on consideration of the MHK characteristics and its future development, the researchers start with design of MHK-MST. This simulation study is conducted to validate the performance of the MHK -MST system. Real difficulty parameters of MHK items and the simulated ability parameters of the candidates are used to generate the original score matrix and the item modules are delivered to the candidates following the adaptive procedures set according to the path rules. This simulation study provides a sound basis for the implementation of MHK-MST.
10alanguage testing10aMHK10amultistage testing1 aYuyu, Ling1 aChenglin, Zhou1 aJie, Ren uhttp://mail.iacat.org/mhk-mst-design-and-related-simulation-study-001575nas a2200145 4500008003900000245008500039210006900124300001200193490000700205520109100212100002501303700002701328700002101355856005301376 2013 d00aUncertainties in the Item Parameter Estimates and Robust Automated Test Assembly0 aUncertainties in the Item Parameter Estimates and Robust Automat a123-1390 v373 aItem response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values, and uncertainty is not taken into account. As a consequence, resulting tests might be off target or less informative than expected. In this article, the process of parameter estimation is described to provide insight into the causes of uncertainty in the item parameters. The consequences of uncertainty are studied. Besides, an alternative automated test assembly algorithm is presented that is robust against uncertainties in the data. Several numerical examples demonstrate the performance of the robust test assembly algorithm, and illustrate the consequences of not taking this uncertainty into account. Finally, some recommendations about the use of robust test assembly and some directions for further research are given.
1 aVeldkamp, Bernard, P1 aMatteucci, Mariagiulia1 aJong, Martijn, G uhttp://apm.sagepub.com/content/37/2/123.abstract01848nas a2200169 4500008004100000245012200041210006900163300001200232490000700244520124800251100001201499700001101511700001401522700001101536700001401547856011701561 2012 eng d00aComparison Between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test0 aComparison Between Dichotomous and Polytomous Scoring of Innovat a493-5090 v723 aThis study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test settings were explored in one real data analysis and two simulation studies when two different polytomous scoring algorithms, automated polytomous scoring and rater-generated polytomous scoring, were applied. For the real data analyses, the ability estimates from dichotomous and polytomous scoring were highly correlated; the classification consistency between different scoring algorithms was nearly perfect. Information distribution changed slightly in the operational item bank. In the two simulation studies comparing each polytomous scoring with dichotomous scoring, the ability estimates resulting from polytomous scoring had slightly higher measurement precision than those resulting from dichotomous scoring. The practical impact related to classification decision was minor because of the extremely small number of items that could be scored polytomously in this current study.
1 aJiao, H1 aLiu, J1 aHaynie, K1 aWoo, A1 aGorham, J uhttp://mail.iacat.org/content/comparison-between-dichotomous-and-polytomous-scoring-innovative-items-large-scale01889nas a2200169 4500008004500000245015100045210006900196490000700265520128400272100001601556700001701572700001401589700001501603700001501618700001401633856007201647 2011 Engldsh 00aDesign of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications0 aDesign of a ComputerAdaptive Test to Measure English Literacy an0 v123 aA computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.
Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive testing (CAT) for two realistic item selection methods, maximum item information and a-stratified with content blocking, using the randomized method as a baseline for comparison. Damage caused by organized item theft was evaluated by the number of compromised items each examinee could encounter and the impact of the compromised items on examinees' ability estimates. Severity of test security violation was assessed under self-organized and organized item theft simulation scenarios. Results indicated that though item theft could cause severe damage to CAT with either item selection method, the maximum item information method was more vulnerable to the organized item theft simulation than was the a-stratified method.
1 aQing Yi1 aJinming Zhang1 aChang, Hua-Hua uhttp://apm.sagepub.com/content/32/7/543.abstract01478nas a2200241 4500008004100000020002200041245007800063210006900141250001500210260001100225300001200236490000700248520069100255653002300946653002500969653002100994653006201015653001101077653001601088100001701104700001401121856010101135 2007 eng d a0277-6715 (Print)00aComputerized adaptive testing for measuring development of young children0 aComputerized adaptive testing for measuring development of young a2006/11/30 cJun 15 a2629-380 v263 aDevelopmental indicators that are used for routine measurement in The Netherlands are usually chosen to optimally identify delayed children. Measurements on the majority of children without problems are therefore quite imprecise. This study explores the use of computerized adaptive testing (CAT) to monitor the development of young children. CAT is expected to improve the measurement precision of the instrument. We do two simulation studies - one with real data and one with simulated data - to evaluate the usefulness of CAT. It is shown that CAT selects developmental indicators that maximally match the individual child, so that all children can be measured to the same precision.10a*Child Development10a*Models, Statistical10aChild, Preschool10aDiagnosis, Computer-Assisted/*statistics & numerical data10aHumans10aNetherlands1 aJacobusse, G1 aBuuren, S uhttp://mail.iacat.org/content/computerized-adaptive-testing-measuring-development-young-children00580nas a2200181 4500008004100000245008300041210006900124300001200193490000700205100001000212700001300222700001100235700001000246700001200256700001400268700001300282856010300295 2007 eng d00aProspective evaluation of the am-pac-cat in outpatient rehabilitation settings0 aProspective evaluation of the ampaccat in outpatient rehabilitat a385-3980 v871 aJette1 aHaley, S1 aTao, W1 aNi, P1 aMoed, R1 aMeyers, D1 aZurek, M uhttp://mail.iacat.org/content/prospective-evaluation-am-pac-cat-outpatient-rehabilitation-settings00526nas a2200133 4500008003900000245013200039210006900171300001200240490000700252100002300259700001900282700002500301856006600326 2006 d00aComparison of the Psychometric Properties of Several Computer-Based Test Designs for Credentialing Exams With Multiple Purposes0 aComparison of the Psychometric Properties of Several ComputerBas a203-2200 v191 aJodoin, Michael, G1 aZenisky, April1 aHambleton, Ronald, K uhttp://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_302653nas a2200397 4500008004100000020002200041245013500063210006900198250001500267260000800282300001200290490000700302520140700309653002601716653003101742653001501773653001001788653000901798653002201807653002501829653003301854653001101887653001101898653000901909653001601918653004601934653003001980653003102010653001302041100001502054700001002069700001802079700001602097700001502113856012702128 2006 eng d a0895-4356 (Print)00aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a2006/10/10 cNov a1174-820 v593 aBACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.10a*Recovery of Function10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aRehabilitation/*standards10aReproducibility of Results10aSoftware1 aHaley, S M1 aNi, P1 aHambleton, RK1 aSlavin, M D1 aJette, A M uhttp://mail.iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selectio-000643nas a2200169 4500008004100000020001300041245013500054210006900189300001400258490000700272100001300279700001000292700001800302700001400320700001300334856012600347 2006 eng d a0895435600aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a1174-11820 v591 aHaley, S1 aNi, P1 aHambleton, RK1 aSlavin, M1 aJette, A uhttp://mail.iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selection03384nas a2200205 4500008004100000020002200041245009700063210006900160260002500229300001200254490000700266520266000273653003402933653002802967100001502995700001903010700001703029700001403046856011803060 2006 eng d a0439-755X (Print)00a[Item Selection Strategies of Computerized Adaptive Testing based on Graded Response Model.]0 aItem Selection Strategies of Computerized Adaptive Testing based bScience Press: China a461-4670 v383 aItem selection strategy (ISS) is an important component of Computerized Adaptive Testing (CAT). Its performance directly affects the security, efficiency and precision of the test. Thus, ISS becomes one of the central issues in CATs based on the Graded Response Model (GRM). It is well known that the goal of IIS is to administer the next unused item remaining in the item bank that best fits the examinees current ability estimate. In dichotomous IRT models, every item has only one difficulty parameter and the item whose difficulty matches the examinee's current ability estimate is considered to be the best fitting item. However, in GRM, each item has more than two ordered categories and has no single value to represent the item difficulty. Consequently, some researchers have used to employ the average or the median difficulty value across categories as the difficulty estimate for the item. Using the average value and the median value in effect introduced two corresponding ISSs. In this study, we used computer simulation compare four ISSs based on GRM. We also discussed the effect of "shadow pool" on the uniformity of pool usage as well as the influence of different item parameter distributions and different ability estimation methods on the evaluation criteria of CAT. In the simulation process, Monte Carlo method was adopted to simulate the entire CAT process; 1,000 examinees drawn from standard normal distribution and four 1,000-sized item pools of different item parameter distributions were also simulated. The assumption of the simulation is that a polytomous item is comprised of six ordered categories. In addition, ability estimates were derived using two methods. They were expected a posteriori Bayesian (EAP) and maximum likelihood estimation (MLE). In MLE, the Newton-Raphson iteration method and the Fisher Score iteration method were employed, respectively, to solve the likelihood equation. Moreover, the CAT process was simulated with each examinee 30 times to eliminate random error. The IISs were evaluated by four indices usually used in CAT from four aspects--the accuracy of ability estimation, the stability of IIS, the usage of item pool, and the test efficiency. Simulation results showed adequate evaluation of the ISS that matched the estimate of an examinee's current trait level with the difficulty values across categories. Setting "shadow pool" in ISS was able to improve the uniformity of pool utilization. Finally, different distributions of the item parameter and different ability estimation methods affected the evaluation indices of CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem selection strategy1 aPing, Chen1 aShuliang, Ding1 aHaijing, Lin1 aJie, Zhou uhttp://mail.iacat.org/content/item-selection-strategies-computerized-adaptive-testing-based-graded-response-model00520nam a2200097 4500008004100000245009600041210006900137260008000206100002400286856011200310 2005 eng d00aA comparison of adaptive mastery testing using testlets with the 3-parameter logistic model0 acomparison of adaptive mastery testing using testlets with the 3 aUnpublished doctoral dissertation, University of Minnesota, Minneapolis, MN1 aJacobs-Cassuto, M S uhttp://mail.iacat.org/content/comparison-adaptive-mastery-testing-using-testlets-3-parameter-logistic-model02124nas a2200253 4500008004100000245007900041210006900120300001200189490000700201520113300208653002701341653004601368653005201414653002901466653001101495653005601506653002501562653004101587653004501628653006201673100001501735700001501750856010501765 2005 eng d00aContemporary measurement techniques for rehabilitation outcomes assessment0 aContemporary measurement techniques for rehabilitation outcomes a339-3450 v373 aIn this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement.10a*Disability Evaluation10aActivities of Daily Living/classification10aDisabled Persons/classification/*rehabilitation10aHealth Status Indicators10aHumans10aOutcome Assessment (Health Care)/*methods/standards10aRecovery of Function10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aSensitivity and Specificity computerized adaptive testing1 aJette, A M1 aHaley, S M uhttp://mail.iacat.org/content/contemporary-measurement-techniques-rehabilitation-outcomes-assessment01316nas a2200229 4500008004100000020002200041245007700063210006900140260002500209300001200234490000700246520057900253653002700832653003000859653002500889100001100914700001600925700001500941700001300956700001500969856010200984 2005 eng d a0439-755X (Print)00a[Item characteristic curve equating under graded response models in IRT]0 aItem characteristic curve equating under graded response models bScience Press: China a832-8380 v373 aIn one of the largest qualificatory tests--economist test, to guarantee the comparability among different years, construct item bank and prepare for computerized adaptive testing, item characteristic curve equating and anchor test equating design under graded models in IRT are used, which have realized the item and ability parameter equating of test data in five years and succeeded in establishing an item bank. Based on it, cut scores of different years are compared by equating and provide demonstrational gist to constitute the eligibility standard of economist test. 10agraded response models10aitem characteristic curve10aItem Response Theory1 aJun, Z1 aDongming, O1 aShuyuan, X1 aHaiqi, D1 aShuqing, Q uhttp://mail.iacat.org/content/item-characteristic-curve-equating-under-graded-response-models-irt03708nas a2200481 4500008004100000245005200041210005200093300001200145490000700157520221100164653001902375653002902394653005802423653001002481653005302491653000902544653001102553653002502564653002602589653003302615653001102648653001002659653000902669653001602678653002402694653007402718653001802792653002902810653005802839653003102897653003202928653003602960653003202996100001503028700001603043700001603059700001603075700001003091700001403101700001803115700001503133856007803148 2004 eng d00aActivity outcome measurement for postacute care0 aActivity outcome measurement for postacute care aI49-1610 v423 aBACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAftercare/*standards/statistics & numerical data10aAged10aBoston10aCognition/physiology10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHuman10aMale10aMiddle Aged10aMovement/physiology10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aPsychometrics10aQuestionnaires/standards10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aSupport, U.S. Gov't, P.H.S.1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aLudlow, L H1 aNi, P1 aBond, T L1 aSinclair, S J1 aJette, A M uhttp://mail.iacat.org/content/activity-outcome-measurement-postacute-care00512nas a2200133 4500008004100000245005100041210005100092260009900143100001700242700001600259700001400275700000800289856008100297 2004 eng d00aComputerized adaptive testing and item banking0 aComputerized adaptive testing and item banking aP. M. Fayers and R. D. Hays (Eds.) Assessing Quality of Life. Oxford: Oxford University Press.1 aBjorner, J B1 aKosinski, M1 aWare, J E1 aJr. uhttp://mail.iacat.org/content/computerized-adaptive-testing-and-item-banking01251nas a2200157 4500008003900000245006400039210006300103300001200166490000700178520076400185100002500949700002200974700002200996700002201018856005301040 2004 d00aComputerized Adaptive Testing With Multiple-Form Structures0 aComputerized Adaptive Testing With MultipleForm Structures a147-1640 v283 aA multiple-form structure (MFS) is an orderedcollection or network of testlets (i.e., sets of items).An examinee’s progression through the networkof testlets is dictated by the correctness of anexaminee’s answers, thereby adapting the test tohis or her trait level. The collection of pathsthrough the network yields the set of all possibletest forms, allowing test specialists the opportunityto review them before they are administered. Also,limiting the exposure of an individual MFS to aspecific period of time can enhance test security.This article provides an overview of methods thathave been developed to generate parallel MFSs.The approach is applied to the assembly of anexperimental computerized Law School Admission Test (LSAT).
1 aArmstrong, Ronald, D1 aJones, Douglas, H1 aKoppel, Nicole, B1 aPashley, Peter, J uhttp://apm.sagepub.com/content/28/3/147.abstract01544nas a2200229 4500008004100000020002200041245006400063210006300127260002600190300001200216490000700228520081800235653003401053653003001087653002801117653001301145100001901158700001501177700001601192700001701208856008901225 2004 eng d a0146-6216 (Print)00aComputerized adaptive testing with multiple-form structures0 aComputerized adaptive testing with multipleform structures bSage Publications: US a147-1640 v283 aA multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aLaw School Admission Test10amultiple-form structure10atestlets1 aArmstrong, R D1 aJones, D H1 aKoppel, N B1 aPashley, P J uhttp://mail.iacat.org/content/computerized-adaptive-testing-multiple-form-structures00614nas a2200121 4500008004100000245012400041210006900165260010500234100001200339700001200351700001100363856011800374 2004 eng d00aAn investigation of two combination procedures of SPRT for three-category decisions in computerized classification test0 ainvestigation of two combination procedures of SPRT for threecat aPaper presented at the annual meeting of the American Educational Research Association, San Diego CA1 aJiao, H1 aWang, S1 aLau, A uhttp://mail.iacat.org/content/investigation-two-combination-procedures-sprt-three-category-decisions-computerized00710nas a2200157 4500008004100000245013900041210006900180260003200249653003400281653004000315653004100355100001200396700001200408700001200420856012000432 2004 eng d00aAn investigation of two combination procedures of SPRT for three-category classification decisions in computerized classification test0 ainvestigation of two combination procedures of SPRT for threecat aSan Antonio, Texasc04/200410acomputerized adaptive testing10aComputerized classification testing10asequential probability ratio testing1 aJiao, H1 aWang, S1 aLau, CA uhttp://mail.iacat.org/content/investigation-two-combination-procedures-sprt-three-category-classification-decisions00502nas a2200109 4500008004100000245006800041210006400109260010600173100001200279700001300291856008800304 2003 eng d00aThe effects of model misfit in computerized classification test0 aeffects of model misfit in computerized classification test aPaper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL1 aJiao, H1 aLau, A C uhttp://mail.iacat.org/content/effects-model-misfit-computerized-classification-test02706nas a2200121 4500008004100000245014800041210006900189300000800258490000700266520217600273100001202449856012302461 2003 eng d00aThe effects of model specification error in item response theory-based computerized classification test using sequential probability ratio test0 aeffects of model specification error in item response theorybase a4780 v643 aThis study investigated the effects of model specification error on classification accuracy, error rates, and average test length in Item Response Theory (IRT) based computerized classification test (CCT) using sequential probability ratio test (SPRT) in making binary decisions from examinees' dichotomous responses. This study consisted of three sub-studies. In each sub-study, one of the three unidimensional dichotomous IRT models, the 1-parameter logistic (IPL), the 2-parameter logistic (2PL), and the 3-parameter logistic (3PL) model was set as the true model and the other two models were treated as the misfit models. Item pool composition, test length, and stratum depth were manipulated to simulate different test conditions. To ensure the validity of the study results, the true model based CCTs using the true and the recalibrated item parameters were compared first to study the effect of estimation error in item parameters in CCTs. Then, the true model and the misfit model based CCTs were compared to accomplish the research goal, The results indicated that estimation error in item parameters did not affect classification results based on CCTs using SPRT. The effect of model specification error depended on the true model, the misfit model, and the item pool composition. When the IPL or the 2PL IRT model was the true model, the use of another IRT model had little impact on the CCT results. When the 3PL IRT model was the true model, the use of the 1PL model raised the false positive error rates. The influence of using the 2PL instead of the 3PL model depended on the item pool composition. When the item discrimination parameters varied greatly from uniformity of one, the use of the 2PL IRT model raised the false negative error rates to above the nominal level. In the simulated test conditions with test length and item exposure constraints, using a misfit model in CCTs most often affected the average test length. Its effects on error rates and classification accuracy were negligible. It was concluded that in CCTs using SPRT, IRT model selection and evaluation is indispensable (PsycINFO Database Record (c) 2004 APA, all rights reserved).1 aJiao, H uhttp://mail.iacat.org/content/effects-model-specification-error-item-response-theory-based-computerized-classification00522nas a2200097 4500008004100000245008500041210006900126260010600195100001400301856010900315 2003 eng d00aA multidimensional IRT mechanism for better understanding adaptive test behavior0 amultidimensional IRT mechanism for better understanding adaptive aPaper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL1 aJodoin, M uhttp://mail.iacat.org/content/multidimensional-irt-mechanism-better-understanding-adaptive-test-behavior02715nas a2200121 4500008004100000245010500041210006900146300000900215490000700224520221900231100001602450856012702466 2003 eng d00aPsychometric properties of several computer-based test designs with ideal and constrained item pools0 aPsychometric properties of several computerbased test designs wi a29780 v643 aThe purpose of this study was to compare linear fixed length test (LFT), multi stage test (MST), and computer adaptive test (CAT) designs under three levels of item pool quality, two levels of match between test and item pool content specifications, two levels of test length, and several levels of exposure control expected to be practical for a number of testing programs. This design resulted in 132 conditions that were evaluated using a simulation study with 9000 examinees on several measures of overall measurement precision including reliability, the mean error and root mean squared error between true and estimated ability levels, classification precision including decision accuracy, false positive and false negative rates, and Kappa for cut scores corresponding to 30%, 50%, and 85% failure rates, and conditional measurement precision with the conditional root mean squared error between true and estimated ability levels conditioned on 25 true ability levels. Test reliability, overall and conditional measurement precision, and classification precision increased with item pool quality and test length, and decreased with less adequate match between item pool and test specification match. In addition, as the maximum exposure rate decreased and the type of exposure control implemented became more restrictive, test reliability, overall and conditional measurement precision, and classification precision decreased. Within item pool quality, match between test and item pool content specifications, test length, and exposure control, CAT designs showed superior psychometric properties as compared to MST designs which in turn were superior to LFT designs. However, some caution is warranted in interpreting these results since the ability of the automated test assembly software to construct test that met specifications was limited in conditions where pool usage was high. The practical importance of the differences between test designs on the evaluation criteria studied is discussed with respect to the inferences test users seek to make from test scores and nonpsychometric factors that may be important in some testing programs. (PsycINFO Database Record (c) 2004 APA, all rights reserved).1 aJodoin, M G uhttp://mail.iacat.org/content/psychometric-properties-several-computer-based-test-designs-ideal-and-constrained-item-pools00534nas a2200121 4500008004100000245010900041210006900150260001900219100001400238700001700252700001800269856012500287 2002 eng d00aComparison of the psychometric properties of several computer-based test designs for credentialing exams0 aComparison of the psychometric properties of several computerbas aNew Orleans LA1 aJodoin, M1 aZenisky, A L1 aHambleton, RK uhttp://mail.iacat.org/content/comparison-psychometric-properties-several-computer-based-test-designs-credentialing-exams00554nas a2200121 4500008004100000245013500041210006900176260001900245100001800264700001400282700001700296856011900313 2002 eng d00aImpact of selected factors on the psychometric quality of credentialing examinations administered with a sequential testlet design0 aImpact of selected factors on the psychometric quality of creden aNew Orleans LA1 aHambleton, RK1 aJodoin, M1 aZenisky, A L uhttp://mail.iacat.org/content/impact-selected-factors-psychometric-quality-credentialing-examinations-administered00493nas a2200097 4500008004100000245012000041210006900161260002400230100001600254856012500270 2002 eng d00aReliability and decision accuracy of linear parallel form and multi stage tests with realistic and ideal item pools0 aReliability and decision accuracy of linear parallel form and mu aWinchester, England1 aJodoin, M G uhttp://mail.iacat.org/content/reliability-and-decision-accuracy-linear-parallel-form-and-multi-stage-tests-realistic-and00494nas a2200121 4500008004100000245009500041210006900136260001500205100001800220700001400238700001900252856010100271 2001 eng d00aAn investigation of the impact of items that exhibit mild DIF on ability estimation in CAT0 ainvestigation of the impact of items that exhibit mild DIF on ab aSeattle WA1 aJennings, J A1 aDodd, B G1 aFitzpatrick, S uhttp://mail.iacat.org/content/investigation-impact-items-exhibit-mild-dif-ability-estimation-cat00545nas a2200157 4500008004100000245006500041210006300106260002700169490000700196653003400203100001700237700001200254700001200266700001700278856009200295 2000 eng d00aComputer-adaptive testing: A methodology whose time has come0 aComputeradaptive testing A methodology whose time has come aChicago, IL. USAbMESA0 v6910acomputerized adaptive testing1 aLinacre, J M1 aKang, U1 aJean, E1 aLinacre, J M uhttp://mail.iacat.org/content/computer-adaptive-testing-methodology-whose-time-has-come00471nas a2200109 4500008004100000245009400041210006900135100001200204700001300216700001900229856011300248 1998 eng d00aA comparison of two methods of controlling item exposure in computerized adaptive testing0 acomparison of two methods of controlling item exposure in comput1 aTang, L1 aJiang, H1 aChang, Hua-Hua uhttp://mail.iacat.org/content/comparison-two-methods-controlling-item-exposure-computerized-adaptive-testing00507nas a2200109 4500008004100000245008000041210006900121260006100190100001900251700001500270856011200285 1998 eng d00aComputer adaptive testing – Approaches for item selection and measurement0 aComputer adaptive testing Approaches for item selection and meas aRutgers Center for Operations Research, New Brunswick NJ1 aArmstrong, R D1 aJones, D H uhttp://mail.iacat.org/content/computer-adaptive-testing-%E2%80%93-approaches-item-selection-and-measurement00447nas a2200121 4500008004100000245006400041210006400105260001500169100001900184700001500203700001600218856009100234 1998 eng d00aComputerized adaptive testing with multiple form structures0 aComputerized adaptive testing with multiple form structures aUrbana, IL1 aArmstrong, R D1 aJones, D H1 aBerliner, N uhttp://mail.iacat.org/content/computerized-adaptive-testing-multiple-form-structures-000599nas a2200133 4500008004100000245012500041210006900166260004600235100001400281700001600295700001600311700001400327856012400341 1998 eng d00aThe relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08)0 arelationship between computer familiarity and performance on com aPrinceton NJ: Educational Testing Service1 aTaylor, C1 aJamieson, J1 aEignor, D R1 aKirsch, I uhttp://mail.iacat.org/content/relationship-between-computer-familiarity-and-performance-computer-based-toefl-test-tasks00458nas a2200121 4500008004100000245006900041210006700110260001500177100001600192700001600208700001600224856009600240 1997 eng d00aAssessing speededness in variable-length computer-adaptive tests0 aAssessing speededness in variablelength computeradaptive tests aChicago IL1 aBontempo, B1 aJulian, E R1 aGorham, J L uhttp://mail.iacat.org/content/assessing-speededness-variable-length-computer-adaptive-tests00579nas a2200133 4500008003900000245013100039210006900170100001700239700001500256700001700271700001400288700001700302856012600319 1997 d00aEvaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests0 aEvaluating an automatically scorable openended response type for1 aBennett, R E1 aSteffen, M1 aSingley, M K1 aMorley, M1 aJacquemin, D uhttp://mail.iacat.org/content/evaluating-automatically-scorable-open-ended-response-type-measuring-mathematical-reasoning00411nas a2200097 4500008004100000245007300041210006900114260001500183100001500198856010000213 1997 eng d00aMathematical programming approaches to computerized adaptive testing0 aMathematical programming approaches to computerized adaptive tes aChicago IL1 aJones, D H uhttp://mail.iacat.org/content/mathematical-programming-approaches-computerized-adaptive-testing00865nas a2200205 4500008004100000245004600041210004600087260001200133300000800145490000600153520029600159653002900455653001500484653001100499653001800510653002400528653001800552100001700570856007200587 1996 eng d00aDispelling myths about the new NCLEX exam0 aDispelling myths about the new NCLEX exam cJan-Feb a6-70 v93 aThe new computerized NCLEX system is working well. Most new candidates, employers, and board of nursing representatives like the computerized adaptive testing system and the fast report of results. But, among the candidates themselves some myths have grown which cause them needless anxiety.10a*Educational Measurement10a*Licensure10aHumans10aNursing Staff10aPersonnel Selection10aUnited States1 aJohnson, S H uhttp://mail.iacat.org/content/dispelling-myths-about-new-nclex-exam00370nas a2200085 4500008004100000245006700041210006700108100001800175856009100193 1995 eng d00aShortfall of questions curbs use of computerized graduate exam0 aShortfall of questions curbs use of computerized graduate exam1 aJacobson, R L uhttp://mail.iacat.org/content/shortfall-questions-curbs-use-computerized-graduate-exam00758nas a2200241 4500008004100000020002200041245006700063210006400130250001500194260000800209300001100217490000700228653001500235653002600250653003700276653002300313653001800336100002100354700001400375700002400389700001400413856008900427 1993 eng d a0744-6314 (Print)00aMoving in a new direction: Computerized adaptive testing (CAT)0 aMoving in a new direction Computerized adaptive testing CAT a1993/01/01 cJan a80, 820 v2410a*Computers10aAccreditation/methods10aEducational Measurement/*methods10aLicensure, Nursing10aUnited States1 aJones-Dickson, C1 aDorsey, D1 aCampbell-Warnock, J1 aFields, F uhttp://mail.iacat.org/content/moving-new-direction-computerized-adaptive-testing-cat00431nas a2200121 4500008004100000245006600041210006600107250000600173300001400179490000700193100001800200856009100218 1993 eng d00aNew computer technique seen producing a revolution in testing0 aNew computer technique seen producing a revolution in testing a4 a22-23, 260 v401 aJacobson, R L uhttp://mail.iacat.org/content/new-computer-technique-seen-producing-revolution-testing00514nas a2200145 4500008004100000245007700041210006900118300001200187490000700199100001400206700001500220700001700235700001400252856010200266 1992 eng d00aA comparison of self-adapted and computerized adaptive achievement tests0 acomparison of selfadapted and computerized adaptive achievement a329-3390 v291 aWise, S L1 aPlake, S S1 aJohnson, P L1 aRoos, S L uhttp://mail.iacat.org/content/comparison-self-adapted-and-computerized-adaptive-achievement-tests00511nas a2200109 4500008004100000245008000041210006900121260007600190100001800266700001200284856010500296 1992 eng d00aThe Language Training Division's computer adaptive reading proficiency test0 aLanguage Training Divisions computer adaptive reading proficienc aProvo, UT: Language Training Division, Office of Training and Education1 aJanczewski, D1 aLowe, P uhttp://mail.iacat.org/content/language-training-divisions-computer-adaptive-reading-proficiency-test00500nas a2200145 4500008004100000245007200041210006900113300001000182490000600192100001700198700001400215700001400229700001500243856009600258 1991 eng d00aCorrelates of examinee item choice behavior in self-adapted testing0 aCorrelates of examinee item choice behavior in selfadapted testi a25-280 v41 aJohnson, J L1 aRoos, L L1 aWise, S L1 aPlake, B S uhttp://mail.iacat.org/content/correlates-examinee-item-choice-behavior-self-adapted-testing00437nas a2200109 4500008004100000245006300041210006000104260004600164100002000210700001500230856008200245 1980 eng d00aAn empirical study of a broad range test of verbal ability0 aempirical study of a broad range test of verbal ability aPrinceton NJ: Educational Testing Service1 aKreitzberg, C B1 aJones, D J uhttp://mail.iacat.org/content/empirical-study-broad-range-test-verbal-ability00726nas a2200109 4500008004100000245011500041210006900156260023700225100001700462700001400479856012300493 1980 eng d00aParallel forms reliability and measurement accuracy comparison of adaptive and conventional testing strategies0 aParallel forms reliability and measurement accuracy comparison o aD. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 16-34). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory.1 aJohnson, M J1 aWeiss, DJ uhttp://mail.iacat.org/content/parallel-forms-reliability-and-measurement-accuracy-comparison-adaptive-and-conventional00400nas a2200097 4500008004100000245007100041210006900112260001300181100001700194856009100211 1979 eng d00aStudent reaction to computerized adaptive testing in the classroom0 aStudent reaction to computerized adaptive testing in the classro aNew York1 aJohnson, M J uhttp://mail.iacat.org/content/student-reaction-computerized-adaptive-testing-classroom00434nas a2200109 4500008004100000245007700041210006900118300001200187490000600199100001700205856010200222 1977 En d00aBayesian Tailored Testing and the Influence of Item Bank Characteristics0 aBayesian Tailored Testing and the Influence of Item Bank Charact a111-1200 v11 aJensema, C J uhttp://mail.iacat.org/content/bayesian-tailored-testing-and-influence-item-bank-characteristics-100432nas a2200109 4500008004100000245007700041210006900118300001200187490000600199100001700205856010000222 1977 eng d00aBayesian tailored testing and the influence of item bank characteristics0 aBayesian tailored testing and the influence of item bank charact a111-1200 v11 aJensema, C J uhttp://mail.iacat.org/content/bayesian-tailored-testing-and-influence-item-bank-characteristics00557nas a2200097 4500008004100000245007700041210006900118260015300187100001700340856010200357 1976 eng d00aBayesian tailored testing and the influence of item bank characteristics0 aBayesian tailored testing and the influence of item bank charact aC. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 82-89). Washington DC: U.S. Government Printing Office.1 aJensema, C J uhttp://mail.iacat.org/content/bayesian-tailored-testing-and-influence-item-bank-characteristics-000368nas a2200109 4500008004100000245005400041210005100095300001000146490000700156100001700163856007800180 1974 eng d00aAn application of latent trait mental test theory0 aapplication of latent trait mental test theory a29-480 v271 aJensema, C J uhttp://mail.iacat.org/content/application-latent-trait-mental-test-theory00741nas a2200145 4500008004100000245018400041210006900225260010800294100001600402700001700418700001500435700001100450700001200461856012200473 1974 eng d00aComputer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training0 aComputerbased adaptive testing models for the Air Force technica aJSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48.1 aHansen, D N1 aJohnson, B F1 aFagan, R L1 aTan, P1 aDick, W uhttp://mail.iacat.org/content/computer-based-adaptive-testing-models-air-force-technical-training-environment-phase-i00344nas a2200109 4500008004100000245004600041210004200087300001200129490000700141100001700148856006900165 1974 eng d00aThe validity of Bayesian tailored testing0 avalidity of Bayesian tailored testing a757-7560 v341 aJensema, C J uhttp://mail.iacat.org/content/validity-bayesian-tailored-testing00442nas a2200109 4500008004100000245004100041210004000082260011100122100001300233700001500246856007100261 1973 eng d00aComputer-based psychological testing0 aComputerbased psychological testing aA. Elithorn and D. Jones (Eds.), Artificial and human thinking (pp. 83-93). San Francisco CA: Jossey-Bass.1 aJones, D1 aWeinman, J uhttp://mail.iacat.org/content/computer-based-psychological-testing00506nas a2200097 4500008004100000245010000041210006900141260006400210100001700274856011700291 1972 eng d00aAn application of latent trait mental test theory to the Washington Pre-College Testing Battery0 aapplication of latent trait mental test theory to the Washington aUnpublished doctoral dissertation, University of Washington1 aJensema, C J uhttp://mail.iacat.org/content/application-latent-trait-mental-test-theory-washington-pre-college-testing-battery