TY - JOUR T1 - Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank JF - Assessment Y1 - In Press A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Philip Spinhoven A1 - Edwin de Beurs AB - We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. UR - https://doi.org/10.1177/1073191117746742 ER - TY - JOUR T1 - How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? JF - Journal of Computerized Adaptive Testing Y1 - 2023 A1 - Ming Him Tai A1 - Allison W. Cooperman A1 - Joseph N. DeWeese A1 - David J. Weiss KW - adaptive measurement of change KW - computerized adaptive testing KW - longitudinal measurement KW - trait change patterns VL - 10 IS - 3 ER - TY - JOUR T1 - Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare JF - BMC Pediatrics Y1 - 2020 A1 - Theunissen, Meninou H.C. A1 - de Wolff, Marianne S. A1 - Deurloo, Jacqueline A. A1 - Vogels, Anton G. C. AB -

Background

Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.

Methods

We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.

Results

Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).

Conclusions

CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.

VL - 20 UR - https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 IS - Article number: 119 ER - TY - JOUR T1 - Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2020 A1 - Jing Yang A1 - Hua-Hua Chang A1 - Jian Tao A1 - Ningzhong Shi AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. VL - 44 UR - https://doi.org/10.1177/0146621619893783 ER - TY - JOUR T1 - Nonparametric CAT for CD in Educational Settings With Small Samples JF - Applied Psychological Measurement Y1 - 2019 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai AB - Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. VL - 43 UR - https://doi.org/10.1177/0146621618813113 ER - TY - JOUR T1 - Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items JF - Applied Psychological Measurement Y1 - 2018 A1 - Dongbo Tu A1 - Yuting Han A1 - Yan Cai A1 - Xuliang Gao AB - Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. VL - 42 UR - https://doi.org/10.1177/0146621618762748 ER - TY - JOUR T1 - Measuring patient-reported outcomes adaptively: Multidimensionality matters! JF - Applied Psychological Measurement Y1 - 2018 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Glas, C. A. W. A1 - Terwee, C. B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. ER - TY - JOUR T1 - On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis JF - Journal of Educational Measurement Y1 - 2018 A1 - Liu, Shuchang A1 - Cai, Yan A1 - Tu, Dongbo AB - Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. VL - 55 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 ER - TY - CONF T1 - Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach T2 - IACAT 2017 Conference Y1 - 2017 A1 - Yuan-Pei Chang A1 - Chia-Yi Chiu A1 - Rung-Ching Tsai KW - CD-CAT KW - non-parametric approach AB -

In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.

Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.

Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.

Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.

Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.

Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan ER - TY - CONF T1 - Developing a CAT: An Integrated Perspective T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson KW - CAT Development KW - integrated approach AB -

Most resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.

Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.

Steps include:

1. Feasibility, applicability, and planning studies

2. Develop item bank content or utilize existing bank

3. Pretest and calibrate item bank

4. Determine specifications for final CAT

5. Publish live CAT.

So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.

References

Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv ER - TY - JOUR T1 - Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank JF - Evaluation & the Health Professions Y1 - 2017 A1 - Gerard Flens A1 - Niels Smits A1 - Caroline B. Terwee A1 - Joost Dekker A1 - Irma Huijbrechts A1 - Edwin de Beurs AB - We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. VL - 40 UR - https://doi.org/10.1177/0163278716684168 ER - TY - CONF T1 - Evaluation of Parameter Recovery, Drift, and DIF with CAT Data T2 - IACAT 2017 Conference Y1 - 2017 A1 - Nathan Thompson A1 - Jordan Stoeger KW - CAT KW - DIF KW - Parameter Drift KW - Parameter Recovery AB -

Parameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.

When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.

This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq ER - TY - JOUR T1 - Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life JF - Quality of Life Research Y1 - 2017 A1 - Paap, Muirne C. S. A1 - Kroeze, Karel A. A1 - Terwee, Caroline B. A1 - van der Palen, Job A1 - Veldkamp, Bernard P. VL - 26 UR - https://doi.org/10.1007/s11136-017-1624-3 ER - TY - CONF T1 - New Challenges (With Solutions) and Innovative Applications of CAT T2 - IACAT 2017 Conference Y1 - 2017 A1 - Chun Wang A1 - David J. Weiss A1 - Xue Zhang A1 - Jian Tao A1 - Yinhong He A1 - Ping Chen A1 - Shiyu Wang A1 - Susu Zhang A1 - Haiyan Lin A1 - Xiaohong Gao A1 - Hua-Hua Chang A1 - Zhuoran Shang KW - CAT KW - challenges KW - innovative applications AB -

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa ER - TY - CONF T1 - A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing T2 - IACAT 2017 Conference Y1 - 2017 A1 - Jing Yang A1 - Jian Tao A1 - Hua-Hua Chang A1 - Ning-Zhong Shi AB -

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

JF - IACAT 2017 Conference PB - Niigata Seiryo University CY - Niigata, Japan UR - https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr ER - TY - JOUR T1 - The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study JF - Quality of Life Research Y1 - 2017 A1 - Barthel, D. A1 - Otto, C. A1 - Nolte, S. A1 - Meyrose, A.-K. A1 - Fischer, F. A1 - Devine, J. A1 - Walter, O. A1 - Mierke, A. A1 - Fischer, K. I. A1 - Thyen, U. A1 - Klein, M. A1 - Ankermann, T. A1 - Rose, M. A1 - Ravens-Sieberer, U. AB - Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. VL - 26 UR - https://doi.org/10.1007/s11136-016-1437-9 ER - TY - JOUR T1 - Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules JF - Applied Psychological Measurement Y1 - 2015 A1 - Sie, Haskell A1 - Finkelman, Matthew D. A1 - Bartroff, Jay A1 - Thompson, Nathan A. AB - A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. VL - 39 UR - http://apm.sagepub.com/content/39/4/278.abstract ER - TY - JOUR T1 - The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments JF - Educational Measurement: Issues and Practice Y1 - 2013 A1 - Taherbhai, Husein A1 - Seo, Daeryong KW - cohort growth KW - construct-relevant drift KW - evaluation of scale drift KW - philosophical aspects of IRT equating VL - 32 UR - http://dx.doi.org/10.1111/emip.12000 ER - TY - JOUR T1 - Multistage Computerized Adaptive Testing With Uniform Item Exposure JF - Applied Measurement in Education Y1 - 2012 A1 - Edwards, Michael C. A1 - Flora, David B. A1 - Thissen, David VL - 25 UR - http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.660363 ER - TY - JOUR T1 - Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy JF - Physical & Occupational Therapy in Pediatrics Y1 - 2011 A1 - Montpetit, K. A1 - Haley, S. A1 - Bilodeau, N. A1 - Ni, P. A1 - Tian, F. A1 - Gorton, G., 3rd A1 - Mulcahey, M. J. AB - This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. VL - 31 SN - 1541-3144 (Electronic)0194-2638 (Linking) N1 - Montpetit, KathleenHaley, StephenBilodeau, NathalieNi, PengshengTian, FengGorton, George 3rdMulcahey, M JEnglandPhys Occup Ther Pediatr. 2011 Feb;31(1):90-102. Epub 2010 Oct 13. JO - Phys Occup Ther Pediatr ER - TY - JOUR T1 - Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Jacobsen, J. A1 - Ackermann, R. A1 - Egüez, J. A1 - Ganguli, D. A1 - Rickard, P. A1 - Taylor, L. AB -

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology IS - 1 ER - TY - JOUR T1 - A framework for the development of computerized adaptive tests JF - Practical Assessment Research & Evaluation Y1 - 2011 A1 - Thompson, N. A. A1 - Weiss, D. J. AB - A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some of the available research methodologies into a general framework for the development of any CAT assessment. PB - Practical Assessment Research & Evaluation VL - 16 ER - TY - JOUR T1 - JATT Special Issue on Adaptive Testing: Welcome and Overview JF - Journal of Applied Testing Technology Y1 - 2011 A1 - Thompson, N. A. VL - 12 UR - http://www.testpublishers.org/journal-of-applied-testing-technology ER - TY - JOUR T1 - Using Item Response Theory and Adaptive Testing in Online Career Assessment JF - Journal of Career Assessment Y1 - 2011 A1 - Betz, Nancy E. A1 - Turner, Brandon M. AB -

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory (CCI). The kinds of information provided by IRT are shown to give a more precise look at scale quality across the trait continuum and also to permit the use of adaptive testing, where the items administered are tailored to the individual being tested. Such tailoring can significantly reduce testing time while maintaining high quality of measurement. This efficiency is especially useful when multiscale inventories and/or a large number of scales are to be administered. Readers are encouraged to consider using these advances in career assessment.

VL - 19 UR - http://jca.sagepub.com/cgi/content/abstract/19/3/274 ER - TY - CHAP T1 - Computerized adaptive testing by mutual information and multiple imputations Y1 - 2009 A1 - Thissen-Roe, A. AB - Over the years, most computerized adaptive testing (CAT) systems have used score estimation procedures from item response theory (IRT). IRT models have salutary properties for score estimation, error reporting, and next-item selection. However, some testing purposes favor scoring approaches outside IRT. Where a criterion metric is readily available and more relevant than the assessed construct, for example in the selection of job applicants, a predictive model might be appropriate (Scarborough & Somers, 2006). In these cases, neither IRT scoring nor a unidimensional assessment structure can be assumed. Yet, the primary benefit of CAT remains desirable: shorter assessments with minimal loss of accuracy due to unasked items. In such a case, it remains possible to create a CAT system that produces an estimated score from a subset of available items, recognizes differential item information given the emerging item response pattern, and optimizes the accuracy of the score estimated at every successive item. The method of multiple imputations (Rubin, 1987) can be used to simulate plausible scores given plausible response patterns to unasked items (Thissen-Roe, 2005). Mutual information can then be calculated in order to select an optimally informative next item (or set of items). Previously observed response patterns to two complete neural network-scored assessments were resampled according to MIMI CAT item selection. The reproduced CAT scores were compared to full-length assessment scores. Approximately 95% accurate assignment of examinees to one of three score categories was achieved with a 70%-80% reduction in median test length. Several algorithmic factors influencing accuracy and computational performance were examined. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 179 KB} ER - TY - CHAP T1 - Guess what? Score differences with rapid replies versus omissions on a computerized adaptive test Y1 - 2009 A1 - Talento-Miller, E. A1 - Guo, F. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 215 KB} ER - TY - JOUR T1 - Item Selection in Computerized Classification Testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, Nathan A. AB -

Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points.

VL - 69 UR - http://epm.sagepub.com/content/69/5/778.abstract ER - TY - JOUR T1 - Item selection in computerized classification testing JF - Educational and Psychological Measurement Y1 - 2009 A1 - Thompson, N. A. AB - Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points. VL - 69 SN - 0013-1644 ER - TY - JOUR T1 - Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Dumas, H. M. A1 - Fragala-Pinkham, M. A. A1 - Hambleton, R. K. A1 - Montpetit, K. A1 - Bilodeau, N. A1 - Gorton, G. E. A1 - Watson, K. A1 - Tucker, C. A. KW - *Computer Simulation KW - *Health Status KW - *Models, Statistical KW - Adaptation, Psychological KW - Adolescent KW - Cerebral Palsy/*physiopathology KW - Child KW - Child, Preschool KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Massachusetts KW - Pennsylvania KW - Questionnaires KW - Young Adult AB - PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengDumas, Helene MFragala-Pinkham, Maria AHambleton, Ronald KMontpetit, KathleenBilodeau, NathalieGorton, George EWatson, KyleTucker, Carole AK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01A1/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 Apr;18(3):359-70. Epub 2009 Feb 17. U2 - 2692519 ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: CAT software Y1 - 2009 A1 - Thompson, N. A. AB - Development of computerized adaptive tests (CAT) requires a number of appropriate software tools. This paper describes the development of two new CAT software programs. CATSIM has been designed specifically to conduct several different kinds of simulation studies, which are necessary for planning purposes as well as properly designing live CATs. FastCAT is a software system for banking items and publishing CAT tests as standalone files, to be administered anywhere. Both are available for public use. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 283 KB} ER - TY - CHAP T1 - The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: IRT software Y1 - 2009 A1 - Thissen, D. AB - IRTPRO (Item Response Theory for Patient-Reported Outcomes) is an entirely new application for item calibration and test scoring using IRT. IRTPRO implements algorithms for maximum likelihood estimation of item parameters (item calibration) for several unidimensional and multidimensional item response theory (IRT) models for dichotomous and polytomous item responses. In addition, the software provides computation of goodness-of-fit indices, statistics for the diagnosis of local dependence and for the detection of differential item functioning (DIF), and IRT scaled scores. This paper illustrates the use, and some capabilities, of the software. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - PDF File, 817 K ER - TY - JOUR T1 - Replenishing a computerized adaptive test of patient-reported daily activity functioning JF - Quality of Life Research Y1 - 2009 A1 - Haley, S. M. A1 - Ni, P. A1 - Jette, A. M. A1 - Tao, W. A1 - Moed, R. A1 - Meyers, D. A1 - Ludlow, L. H. KW - *Activities of Daily Living KW - *Disability Evaluation KW - *Questionnaires KW - *User-Computer Interface KW - Adult KW - Aged KW - Cohort Studies KW - Computer-Assisted Instruction KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods AB - PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. VL - 18 SN - 0962-9343 (Print)0962-9343 (Linking) N1 - Haley, Stephen MNi, PengshengJette, Alan MTao, WeiMoed, RichardMeyers, DougLudlow, Larry HK02 HD45354-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2009 May;18(4):461-71. Epub 2009 Mar 14. ER - TY - CHAP T1 - Utilizing the generalized likelihood ratio as a termination criterion Y1 - 2009 A1 - Thompson, N. A. AB - Computer-based testing can be used to classify examinees into mutually exclusive groups. Currently, the predominant psychometric algorithm for designing computerized classification tests (CCTs) is the sequential probability ratio test (SPRT; Reckase, 1983) based on item response theory (IRT). The SPRT has been shown to be more efficient than confidence intervals around θ estimates as a method for CCT delivery (Spray & Reckase, 1996; Rudner, 2002). More recently, it was demonstrated that the SPRT, which only uses fixed values, is less efficient than a generalized form which tests whether a given examinee’s θ is below θ1or above θ2 (Thompson, 2007). This formulation allows the indifference region to vary based on observed data. Moreover, this composite hypothesis formulation better represents the conceptual purpose of the test, which is to test whether θ is above or below the cutscore. The purpose of this study was to explore the specifications of the new generalized likelihood ratio (GLR; Huang, 2004). As with the SPRT, the efficiency of the procedure depends on the nominal error rates and the distance between θ1 and θ2 (Eggen, 1999). This study utilized a monte-carlo approach, with 10,000 examinees simulated under each condition, to evaluate differences in efficiency and accuracy due to hypothesis structure, nominal error rate, and indifference region size. The GLR was always at least as efficient as the fixed-point SPRT while maintaining equivalent levels of accuracy. CY - D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF File, 194 KB} ER - TY - JOUR T1 - CAT-MD: Computerized adaptive testing on mobile devices JF - International Journal of Web-Based Learning and Teaching Technologies Y1 - 2008 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 3 ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2008 A1 - Haley, S. M. A1 - Gandek, B. A1 - Siebens, H. A1 - Black-Schaffer, R. M. A1 - Sinclair, S. J. A1 - Tao, W. A1 - Coster, W. J. A1 - Ni, P. A1 - Jette, A. M. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. VL - 89 SN - 1532-821X (Electronic)0003-9993 (Linking) N1 - Haley, Stephen MGandek, BarbaraSiebens, HilaryBlack-Schaffer, Randie MSinclair, Samuel JTao, WeiCoster, Wendy JNi, PengshengJette, Alan MK02 HD045354-01A1/HD/NICHD NIH HHS/United StatesK02 HD45354-01/HD/NICHD NIH HHS/United StatesR01 HD043568/HD/NICHD NIH HHS/United StatesR01 HD043568-01/HD/NICHD NIH HHS/United StatesResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2008 Feb;89(2):275-83. U2 - 2666330 ER - TY - CONF T1 - Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing T2 - Joint Meeting on Adolescent Treatment Effectiveness Y1 - 2008 A1 - Dennis, M. L. A1 - Funk, R. A1 - Titus, J. A1 - Riley, B. B. A1 - Hosman, S. A1 - Kinne, S. JF - Joint Meeting on Adolescent Treatment Effectiveness CY - Washington D.C., USA N1 - ProCite field[6]: Paper presented at the ER - TY - JOUR T1 - The D-optimality item selection criterion in the early stage of CAT: A study with the graded response model JF - Journal of Educational and Behavioral Statistics Y1 - 2008 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. S. KW - computerized adaptive testing KW - D optimality KW - item selection AB - During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT. VL - 33 ER - TY - JOUR T1 - An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain JF - BMC Musculoskelet Disorders Y1 - 2008 A1 - Elhan, A. H. A1 - Oztuna, D. A1 - Kutlay, S. A1 - Kucukdeveci, A. A. A1 - Tennant, A. AB - ABSTRACT: BACKGROUND: Recent approaches to outcome measurement involving Computerized Adaptive Testing (CAT) offer an approach for measuring disability in low back pain (LBP) in a way that can reduce the burden upon patient and professional. The aim of this study was to explore the potential of CAT in LBP for measuring disability as defined in the International Classification of Functioning, Disability and Health (ICF) which includes impairments, activity limitation, and participation restriction. METHODS: 266 patients with low back pain answered questions from a range of widely used questionnaires. An exploratory factor analysis (EFA) was used to identify disability dimensions which were then subjected to Rasch analysis. Reliability was tested by internal consistency and person separation index (PSI). Discriminant validity of disability levels were evaluated by Spearman correlation coefficient (r), intraclass correlation coefficient [ICC(2,1)] and the Bland-Altman approach. A CAT was developed for each dimension, and the results checked against simulated and real applications from a further 133 patients. RESULTS: Factor analytic techniques identified two dimensions named "body functions" and "activity-participation". After deletion of some items for failure to fit the Rasch model, the remaining items were mostly free of Differential Item Functioning (DIF) for age and gender. Reliability exceeded 0.90 for both dimensions. The disability levels generated using all items and those obtained from the real CAT application were highly correlated (i.e. >0.97 for both dimensions). On average, 19 and 14 items were needed to estimate the precise disability levels using the initial CAT for the first and second dimension. However, a marginal increase in the standard error of the estimate across successive iterations substantially reduced the number of items required to make an estimate. CONCLUSIONS: Using a combination approach of EFA and Rasch analysis this study has shown that it is possible to calibrate items onto a single metric in a way that can be used to provide the basis of a CAT application. Thus there is an opportunity to obtain a wide variety of information to evaluate the biopsychosocial model in its more complex forms, without necessarily increasing the burden of information collection for patients. VL - 9 SN - 1471-2474 (Electronic) N1 - Journal articleBMC musculoskeletal disordersBMC Musculoskelet Disord. 2008 Dec 18;9(1):166. ER - TY - BOOK T1 - A comparison of two methods of polytomous computerized classification testing for multiple cutscores Y1 - 2007 A1 - Thompson, N. A. CY - Unpublished doctoral dissertation, University of Minnesota N1 - {PDF file, 363 KB} ER - TY - CHAP T1 - Computerized classification testing with composite hypotheses Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. CY - D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 96 KB} ER - TY - Generic T1 - Computerized classification testing with composite hypotheses T2 - GMAC Conference on Computerized Adaptive Testing Y1 - 2007 A1 - Thompson, N. A. A1 - Ro, S. KW - computerized adaptive testing JF - GMAC Conference on Computerized Adaptive Testing PB - Graduate Management Admissions Council CY - St. Paul, MN N1 - Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [date] from www. psych. umn. edu/psylabs/CATCentral ER - TY - CONF T1 - Cutscore location and classification accuracy in computerized classification testing T2 - Paper presented at the international meeting of the Psychometric Society Y1 - 2007 A1 - Ro, S. A1 - Thompson, N. A. JF - Paper presented at the international meeting of the Psychometric Society CY - Tokyo, Japan N1 - {PDF file, 94 KB} ER - TY - JOUR T1 - The design and evaluation of a computerized adaptive test on mobile devices JF - Computers & Education Y1 - 2007 A1 - Triantafillou, E. A1 - Georgiadou, E. A1 - Economides, A. A. VL - 49. ER - TY - JOUR T1 - Developing tailored instruments: item banking and computerized adaptive assessment JF - Quality of Life Research Y1 - 2007 A1 - Bjorner, J. B. A1 - Chang, C-H. A1 - Thissen, D. A1 - Reeve, B. B. KW - *Health Status KW - *Health Status Indicators KW - *Mental Health KW - *Outcome Assessment (Health Care) KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Algorithms KW - Factor Analysis, Statistical KW - Humans KW - Models, Statistical KW - Psychometrics AB - Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. VL - 16 SN - 0962-9343 (Print) N1 - Bjorner, Jakob BueChang, Chih-HungThissen, DavidReeve, Bryce B1R43NS047763-01/NS/United States NINDSAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:95-108. Epub 2007 Feb 15. ER - TY - CHAP T1 - Exploring potential designs for multi-form structure computerized adaptive tests with uniform item exposure Y1 - 2007 A1 - Edwards, M. C. A1 - Thissen, D. CY - D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 295 KB} ER - TY - CHAP T1 - Investigating CAT designs to achieve comparability with a paper test Y1 - 2007 A1 - Thompson, T. A1 - Way, W. D. CY - In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. N1 - {PDF file, 141 KB} ER - TY - JOUR T1 - IRT health outcomes data analysis project: an overview and summary JF - Quality of Life Research Y1 - 2007 A1 - Cook, K. F. A1 - Teal, C. R. A1 - Bjorner, J. B. A1 - Cella, D. A1 - Chang, C-H. A1 - Crane, P. K. A1 - Gibbons, L. E. A1 - Hays, R. D. A1 - McHorney, C. A. A1 - Ocepek-Welikson, K. A1 - Raczek, A. E. A1 - Teresi, J. A. A1 - Reeve, B. B. KW - *Data Interpretation, Statistical KW - *Health Status KW - *Quality of Life KW - *Questionnaires KW - *Software KW - Female KW - HIV Infections/psychology KW - Humans KW - Male KW - Neoplasms/psychology KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Stress, Psychological AB - BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. VL - 16 SN - 0962-9343 (Print) N1 - Cook, Karon FTeal, Cayla RBjorner, Jakob BCella, DavidChang, Chih-HungCrane, Paul KGibbons, Laura EHays, Ron DMcHorney, Colleen AOcepek-Welikson, KatjaRaczek, Anastasia ETeresi, Jeanne AReeve, Bryce B1U01AR52171-01/AR/United States NIAMSR01 (CA60068)/CA/United States NCIY1-PC-3028-01/PC/United States NCIResearch Support, N.I.H., ExtramuralNetherlandsQuality of life research : an international journal of quality of life aspects of treatment, care and rehabilitationQual Life Res. 2007;16 Suppl 1:121-32. Epub 2007 Mar 10. ER - TY - CONF T1 - Item selection in computerized classification testing T2 - Paper presented at the Conference on High Stakes Testing Y1 - 2007 A1 - Thompson, N. A. JF - Paper presented at the Conference on High Stakes Testing CY - University of Nebraska N1 - {PDF file, 87KB} ER - TY - JOUR T1 - Methodological issues for building item banks and computerized adaptive scales JF - Quality of Life Research Y1 - 2007 A1 - Thissen, D. A1 - Reeve, B. B. A1 - Bjorner, J. B. A1 - Chang, C-H. AB - Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. VL - 16 SN - 0962-93431573-2649 ER - TY - JOUR T1 - A Practitioner’s Guide for Variable-length Computerized Classification Testing JF - Practical Assessment Research and Evaluation Y1 - 2007 A1 - Thompson, N. A. VL - 12 IS - 1 ER - TY - Generic T1 - A practitioner's guide to variable-length computerized classification testing Y1 - 2007 A1 - Thompson, N. A. KW - CAT KW - classification KW - computer adaptive testing KW - computerized adaptive testing KW - Computerized classification testing AB - Variable-length computerized classification tests, CCTs, (Lin & Spray, 2000; Thompson, 2006) are a powerful and efficient approach to testing for the purpose of classifying examinees into groups. CCTs are designed by the specification of at least five technical components: psychometric model, calibrated item bank, starting point, item selection algorithm, and termination criterion. Several options exist for each of these CCT components, creating a myriad of possible designs. Confusion among designs is exacerbated by the lack of a standardized nomenclature. This article outlines the components of a CCT, common options for each component, and the interaction of options for different components, so that practitioners may more efficiently design CCTs. It also offers a suggestion of nomenclature. JF - Practical Assessment, Research and Evaluation VL - 12 ER - TY - JOUR T1 - Prospective evaluation of the am-pac-cat in outpatient rehabilitation settings JF - Physical Therapy Y1 - 2007 A1 - Jette, A., A1 - Haley, S. A1 - Tao, W. A1 - Ni, P. A1 - Moed, R. A1 - Meyers, D. A1 - Zurek, M. VL - 87 ER - TY - JOUR T1 - Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) JF - Medical Care Y1 - 2007 A1 - Reeve, B. B. A1 - Hays, R. D. A1 - Bjorner, J. B. A1 - Cook, K. F. A1 - Crane, P. K. A1 - Teresi, J. A. A1 - Thissen, D. A1 - Revicki, D. A. A1 - Weiss, D. J. A1 - Hambleton, R. K. A1 - Liu, H. A1 - Gershon, R. C. A1 - Reise, S. P. A1 - Lai, J. S. A1 - Cella, D. KW - *Health Status KW - *Information Systems KW - *Quality of Life KW - *Self Disclosure KW - Adolescent KW - Adult KW - Aged KW - Calibration KW - Databases as Topic KW - Evaluation Studies as Topic KW - Female KW - Humans KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Psychometrics KW - Questionnaires/standards KW - United States AB - BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. VL - 45 SN - 0025-7079 (Print) N1 - Reeve, Bryce BHays, Ron DBjorner, Jakob BCook, Karon FCrane, Paul KTeresi, Jeanne AThissen, DavidRevicki, Dennis AWeiss, David JHambleton, Ronald KLiu, HonghuGershon, RichardReise, Steven PLai, Jin-sheiCella, DavidPROMIS Cooperative GroupAG015815/AG/United States NIAResearch Support, N.I.H., ExtramuralUnited StatesMedical careMed Care. 2007 May;45(5 Suppl 1):S22-31. ER - TY - JOUR T1 - A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 JF - Journal of Technology,Learning, and Assessment, Y1 - 2007 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. AB - Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. VL - 5(8) N1 - http://www.jtla.org. {PDF file, 326 KB} ER - TY - JOUR T1 - Test design optimization in CAT early stage with the nominal response model JF - Applied Psychological Measurement Y1 - 2007 A1 - Passos, V. L. A1 - Berger, M. P. F. A1 - Tan, F. E. KW - computerized adaptive testing KW - nominal response model KW - robust performance KW - test design optimization AB - The early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A polytomous unconstrained CAT simulation is carried out to evaluate this criterion's performance under different test premises. The simulation shows that the extent of early stage instability depends primarily on the quality of the item pool information and its size and secondarily on the item selection criteria. The efficiency of the D-optimality criterion is similar to the efficiency of other known item selection criteria. Yet, it often yields estimates that, at the beginning of CAT, display a more robust performance against instability. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Sage Publications: US VL - 31 SN - 0146-6216 (Print) ER - TY - JOUR T1 - The comparison among item selection strategies of CAT with multiple-choice items JF - Acta Psychologica Sinica Y1 - 2006 A1 - Hai-qi, D. A1 - De-zhi, C. A1 - Shuliang, D. A1 - Taiping, D. KW - CAT KW - computerized adaptive testing KW - graded response model KW - item selection strategies KW - multiple choice items AB - The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) PB - Science Press: China VL - 38 SN - 0439-755X (Print) ER - TY - JOUR T1 - Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes JF - Archives of Physical Medicine and Rehabilitation Y1 - 2006 A1 - Haley, S. M. A1 - Siebens, H. A1 - Coster, W. J. A1 - Tao, W. A1 - Black-Schaffer, R. M. A1 - Gandek, B. A1 - Sinclair, S. J. A1 - Ni, P. KW - *Activities of Daily Living KW - *Adaptation, Physiological KW - *Computer Systems KW - *Questionnaires KW - Adult KW - Aged KW - Aged, 80 and over KW - Chi-Square Distribution KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Longitudinal Studies KW - Male KW - Middle Aged KW - Outcome Assessment (Health Care)/*methods KW - Patient Discharge KW - Prospective Studies KW - Rehabilitation/*standards KW - Subacute Care/*standards AB - OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. VL - 87 SN - 0003-9993 (Print) N1 - Haley, Stephen MSiebens, HilaryCoster, Wendy JTao, WeiBlack-Schaffer, Randie MGandek, BarbaraSinclair, Samuel JNi, PengshengK0245354-01/phsR01 hd043568/hd/nichdResearch Support, N.I.H., ExtramuralUnited StatesArchives of physical medicine and rehabilitationArch Phys Med Rehabil. 2006 Aug;87(8):1033-42. ER - TY - JOUR T1 - Evaluation parameters for computer adaptive testing JF - British Journal of Educational Technology Y1 - 2006 A1 - Georgiadou, E. A1 - Triantafillou, E. A1 - Economides, A. A. VL - Vol. 37 IS - No 2 ER - TY - JOUR T1 - Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications JF - Medical Care Y1 - 2006 A1 - Teresi, J. A. KW - *Cross-Cultural Comparison KW - Data Interpretation, Statistical KW - Factor Analysis, Statistical KW - Guidelines as Topic KW - Humans KW - Models, Statistical KW - Psychometrics/*methods KW - Statistics as Topic/*methods KW - Statistics, Nonparametric AB - BACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence. VL - 44 SN - 0025-7079 (Print)0025-7079 (Linking) N1 - Teresi, Jeanne AAG15294/AG/NIA NIH HHS/United StatesResearch Support, N.I.H., ExtramuralResearch Support, Non-U.S. Gov'tReviewUnited StatesMedical careMed Care. 2006 Nov;44(11 Suppl 3):S39-49. ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - computerized adaptive testing KW - Flexilevel Scale of Shoulder Function KW - Item Response Theory KW - Rehabilitation AB -

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

VL - 59 IS - 3 ER - TY - JOUR T1 - Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function JF - Journal of Clinical Epidemiology Y1 - 2006 A1 - Hart, D. L. A1 - Cook, K. F. A1 - Mioduski, J. E. A1 - Teal, C. R. A1 - Crane, P. K. KW - *Computer Simulation KW - *Range of Motion, Articular KW - Activities of Daily Living KW - Adult KW - Aged KW - Aged, 80 and over KW - Factor Analysis, Statistical KW - Female KW - Humans KW - Male KW - Middle Aged KW - Prospective Studies KW - Reproducibility of Results KW - Research Support, N.I.H., Extramural KW - Research Support, U.S. Gov't, Non-P.H.S. KW - Shoulder Dislocation/*physiopathology/psychology/rehabilitation KW - Shoulder Pain/*physiopathology/psychology/rehabilitation KW - Shoulder/*physiopathology KW - Sickness Impact Profile KW - Treatment Outcome AB - BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. VL - 59 N1 - 0895-4356 (Print)Journal ArticleValidation Studies ER - TY - BOOK T1 - Adaptive selection of personality items to inform a neural network predicting job performance Y1 - 2005 A1 - Thissen-Roe, A. CY - Unpublished doctoral dissertation, University of Washington N1 - {PDF file, 488 KB} ER - TY - JOUR T1 - A computer-assisted test design and diagnosis system for use by classroom teachers JF - Journal of Computer Assisted Learning Y1 - 2005 A1 - He, Q. A1 - Tymms, P. KW - Computer Assisted Testing KW - Computer Software KW - Diagnosis KW - Educational Measurement KW - Teachers AB - Computer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) VL - 21 ER - TY - JOUR T1 - Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer JF - Evaluation and the Health Professions Y1 - 2005 A1 - Eton, D. T. A1 - Lai, J. S. A1 - Cella, D. A1 - Reeve, B. B. A1 - Talcott, J. A. A1 - Clark, J. A. A1 - McPherson, C. P. A1 - Litwin, M. S. A1 - Moinpour, C. M. KW - *Quality of Life KW - *Questionnaires KW - Adult KW - Aged KW - Data Collection/methods KW - Humans KW - Intestine, Large/*physiopathology KW - Male KW - Middle Aged KW - Prostatic Neoplasms/*physiopathology KW - Psychometrics KW - Research Support, Non-U.S. Gov't KW - Statistics, Nonparametric AB - Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. VL - 28 N1 - 0163-2787 (Print)Journal Article ER - TY - JOUR T1 - Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire JF - Alcoholism: Clinical & Experimental Research Y1 - 2005 A1 - Kahler, C. W. A1 - Strong, D. R. A1 - Read, J. P. A1 - De Boeck, P. A1 - Wilson, M. A1 - Acton, G. S. A1 - Palfai, T. P. A1 - Wood, M. D. A1 - Mehta, P. D. A1 - Neale, M. C. A1 - Flay, B. R. A1 - Conklin, C. A. A1 - Clayton, R. R. A1 - Tiffany, S. T. A1 - Shiffman, S. A1 - Krueger, R. F. A1 - Nichol, P. E. A1 - Hicks, B. M. A1 - Markon, K. E. A1 - Patrick, C. J. A1 - Iacono, William G. A1 - McGue, Matt A1 - Langenbucher, J. W. A1 - Labouvie, E. A1 - Martin, C. S. A1 - Sanjuan, P. M. A1 - Bavly, L. A1 - Kirisci, L. A1 - Chung, T. A1 - Vanyukov, M. A1 - Dunn, M. A1 - Tarter, R. A1 - Handel, R. W. A1 - Ben-Porath, Y. S. A1 - Watt, M. KW - Psychometrics KW - Substance-Related Disorders AB - Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association VL - 29 N1 - MiscellaneousArticleMiscellaneous Article ER - TY - JOUR T1 - Pre-equating: a simulation study based on a large scale assessment model JF - Journal of Applied Measurement Y1 - 2004 A1 - Taherbhai, H. M. A1 - Young, M. J. KW - *Databases KW - *Models, Theoretical KW - Calibration KW - Human KW - Psychometrics KW - Reference Values KW - Reproducibility of Results AB - Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank. VL - 5 N1 - 1529-7713Journal Article ER - TY - JOUR T1 - Siette: a web-based tool for adaptive testing JF - International Journal of Artificial Intelligence in Education Y1 - 2004 A1 - Conejo, R A1 - Guzmán, E A1 - Millán, E A1 - Trella, M A1 - Pérez-De-La-Cruz, JL A1 - Ríos, A KW - computerized adaptive testing VL - 14 ER - TY - JOUR T1 - A Bayesian method for the detection of item preknowledge in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - McLeod L. D., Lewis, C., A1 - Thissen, D. VL - 27 ER - TY - JOUR T1 - A Bayesian method for the detection of item preknowledge in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - McLeod, L. A1 - Lewis, C. A1 - Thissen, D. KW - Adaptive Testing KW - Cheating KW - Computer Assisted Testing KW - Individual Differences computerized adaptive testing KW - Item KW - Item Analysis (Statistical) KW - Mathematical Modeling KW - Response Theory AB - With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) VL - 27 ER - TY - CONF T1 - Evaluating stability of online item calibrations under varying conditions T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2003 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - The evaluation of exposure control procedures for an operational CAT. T2 - Paper presented at the Annual Meeting of the American Educational Research Association Y1 - 2003 A1 - French, B. F. A1 - Thompson, T. T. JF - Paper presented at the Annual Meeting of the American Educational Research Association CY - Chicago IL ER - TY - JOUR T1 - Application of an empirical Bayes enhancement of Mantel-Haenszel differential item functioning analysis to a computerized adaptive test JF - Applied Psychological Measurement Y1 - 2002 A1 - Zwick, R. A1 - Thayer, D. T. VL - 26 ER - TY - JOUR T1 - Computer adaptive testing: The impact of test characteristics on perceived performance and test takers' reactions JF - Dissertation Abstracts International: Section B: the Sciences & Engineering Y1 - 2002 A1 - Tonidandel, S. KW - computerized adaptive testing AB - This study examined the relationship between characteristics of adaptive testing and test takers' subsequent reactions to the test. Participants took a computer adaptive test in which two features, the difficulty of the initial item and the difficulty of subsequent items, were manipulated. These two features of adaptive testing determined the number of items answered correctly by examinees and their subsequent reactions to the test. The data show that the relationship between test characteristics and reactions was fully mediated by perceived performance on the test. In addition, the impact of feedback on reactions to adaptive testing was also evaluated. In general, feedback that was consistent with perceptions of performance had a positive impact on reactions to the test. Implications for adaptive test design concerning maximizing test takers' reactions are discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 62 ER - TY - JOUR T1 - Computer-adaptive testing: The impact of test characteristics on perceived performance and test takers’ reactions JF - Journal of Applied Psychology Y1 - 2002 A1 - Tonidandel, S. A1 - Quiñones, M. A. A1 - Adams, A. A. VL - 87 ER - TY - CONF T1 - Developing tailored instruments: Item banking and computerized adaptive assessment T2 - Paper presented at the conference “Advances in Health Outcomes Measurement Y1 - 2002 A1 - Thissen, D. JF - Paper presented at the conference “Advances in Health Outcomes Measurement CY - ” Bethesda, Maryland, June 23-25 N1 - {PDF file, 170 KB} ER - TY - CONF T1 - Employing new ideas in CAT to a simulated reading test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2002 A1 - Thompson, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans LA N1 - {PDF file, 216 KB} ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2002 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu, Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New Orleans LA ER - TY - CONF T1 - Mapping the Development of Pre-reading Skills with STAR Early Literacy T2 - Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. Y1 - 2002 A1 - J. R. McBride A1 - Tardrew, S.P. JF - Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. ER - TY - CONF T1 - Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Chang, S-W. A1 - Twu, B.-Y. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA N1 - #CH01-02 {PDF file, 695 KB} ER - TY - JOUR T1 - Evaluation of an MMPI-A short form: Implications for adaptive testing JF - Journal of Personality Assessment Y1 - 2001 A1 - Archer, R. P. A1 - Tirrell, C. A. A1 - Elkins, D. E. KW - Adaptive Testing KW - Mean KW - Minnesota Multiphasic Personality Inventory KW - Psychometrics KW - Statistical Correlation KW - Statistical Samples KW - Test Forms AB - Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) VL - 76 ER - TY - CONF T1 - An investigation of procedures for estimating error indexes in proficiency estimation in a realistic second-order equitable CAT environment T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 2001 A1 - Shyu, C.-Y. A1 - Fan, M. A1 - Thompson, T, A1 - Hsu. JF - Paper presented at the annual meeting of the American Educational Research Association CY - Seattle WA ER - TY - CHAP T1 - Item response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores T2 - Test scoring Y1 - 2001 A1 - Thissen, D. A1 - Nelson, L. A. A1 - Swygert, K. A. KW - Adaptive Testing KW - Item Response Theory KW - Method) KW - Multiple Choice (Testing KW - Scoring (Testing) KW - Statistical Estimation KW - Statistical Weighting KW - Test Items KW - Test Scores AB - (From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA ) JF - Test scoring PB - Lawrence Erlbaum Associates CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingTest scoring. (pp. 293-341). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xii, 422 pp ER - TY - JOUR T1 - Multidimensional adaptive testing using the weighted likelihood estimation JF - Dissertation Abstracts International Section A: Humanities & Social Sciences Y1 - 2001 A1 - Tseng, F-L. KW - computerized adaptive testing AB - This study extended Warm's (1989) weighted likelihood estimation (WLE) to a multidimensional computerized adaptive test (MCAT) setting. WLE was compared with the maximum likelihood estimation (MLE), expected a posteriori (EAP), and maximum a posteriori (MAP) using a three-dimensional 3PL IRT model under a variety of computerized adaptive testing conditions. The dependent variables included bias, standard error of ability estimates (SE), square root of mean square error (RMSE), and test information. The independent variables were ability estimation methods, intercorrelation levels between dimensions, multidimensional structures, and ability combinations. Simulation results were presented in terms of descriptive statistics, such as figures and tables. In addition, inferential procedures were used to analyze bias by conceptualizing this Monte Carlo study as a statistical sampling experiment. The results of this study indicate that WLE and the other three estimation methods yield significantly more accurate ability estimates under an approximate simple test structure with one dominant dimension and several secondary dimensions. All four estimation methods, especially WLE, yield very large SEs when a three equally dominant multidimensional structure was employed. Consistent with previous findings based on unidimensional IRT model, MLE and WLE are less biased in the extreme of the ability scale; MLE and WLE yield larger SEs than the Bayesian methods; test information-based SEs underestimate actual SEs for both MLE and WLE in MCAT situations, especially at shorter test lengths; WLE reduced the bias of MLE under the approximate simple structure; test information-based SEs underestimates the actual SEs of MLE and WLE estimators in the MCAT conditions, similar to the findings of Warm (1989) in the unidimensional case. The results from the MCAT simulations did show some advantages of WLE in reducing the bias of MLE under the approximate simple structure with a fixed test length of 50 items, which was consistent with the previous research findings based on different unidimensional models. It is clear from the current results that all four methods perform very poorly when the multidimensional structures with multiple dominant factors were employed. More research efforts are urged to investigate systematically how different multidimensional structures affect the accuracy and reliability of ability estimation. Based on the simulated results in this study, there is no significant effect found on the ability estimation from the intercorrelation between dimensions. (PsycINFO Database Record (c) 2003 APA, all rights reserved). VL - 61 ER - TY - CONF T1 - Multidimensional adaptive testing using weighted likelihood estimation: A comparison of estimation methods T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2001 A1 - Tseng, F.-E. A1 - Hsu, T.-C. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Seattle WA N1 - {PDF file, 988 KB} ER - TY - JOUR T1 - Pasado, presente y futuro de los test adaptativos informatizados: Entrevista con Isaac I. Béjar [Past, present and future of computerized adaptive testing: Interview with Isaac I. Béjar] JF - Psicothema Y1 - 2001 A1 - Tejada, R. A1 - Antonio, J. KW - computerized adaptive testing AB - En este artículo se presenta el resultado de una entrevista con Isaac I. Bejar. El Dr. Bejar es actualmente Investigador Científico Principal y Director del Centro para el Diseño de Evaluación y Sistemas de Puntuación perteneciente a la División de Investigación del Servicio de Medición Educativa (Educa - tional Testing Service, Princeton, NJ, EE.UU.). El objetivo de esta entrevista fue conversar sobre el pasado, presente y futuro de los Tests Adaptativos Informatizados. En la entrevista se recogen los inicios de los Tests Adaptativos y de los Tests Adaptativos Informatizados y últimos avances que se desarrollan en el Educational Testing Service sobre este tipo de tests (modelos generativos, isomorfos, puntuación automática de ítems de ensayo…). Se finaliza con la visión de futuro de los Tests Adaptativos Informatizados y su utilización en España.Past, present and future of Computerized Adaptive Testing: Interview with Isaac I. Bejar. In this paper the results of an interview with Isaac I. Bejar are presented. Dr. Bejar is currently Principal Research Scientist and Director of Center for Assessment Design and Scoring, in Research Division at Educational Testing Service (Princeton, NJ, U.S.A.). The aim of this interview was to review the past, present and future of the Computerized Adaptive Tests. The beginnings of the Adaptive Tests and Computerized Adaptive Tests, and the latest advances developed at the Educational Testing Service (generative response models, isomorphs, automated scoring…) are reviewed. The future of Computerized Adaptive Tests is analyzed, and its utilization in Spain commented. VL - 13 SN - 0214-9915 ER - TY - CONF T1 - Applying specific information item selection to a passage-based test T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 2000 A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - New Orleans, LA, April ER - TY - BOOK T1 - Computerized adaptive testing: A primer (2nd edition) Y1 - 2000 A1 - Wainer, H., A1 - Dorans, N. A1 - Eignor, D. R. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale, N. J. : Lawrence Erlbaum Associates ER - TY - JOUR T1 - The impact of receiving the same items on consecutive computer adaptive test administrations JF - Journal of Applied Measurement Y1 - 2000 A1 - O'Neill, T. A1 - Lunz, M. E. A1 - Thiede, K. AB - Addresses item exposure in a Computerized Adaptive Test (CAT) when the item selection algorithm is permitted to present examinees with questions that they have already been asked in a previous test administration. The data were from a national certification exam in medical technology. The responses of 178 repeat examinees were compared. The results indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate person ability provides substantial protection from score contamination. The implications for constraints that prohibit examinees from seeing an item twice are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). VL - 1 N1 - Richard M Smith, US ER - TY - JOUR T1 - Psychological reactions to adaptive testing JF - International Journal of Selection and Assessment Y1 - 2000 A1 - Tonidandel, S., A1 - Quiñones, M. A. VL - 8 ER - TY - CHAP T1 - Using Bayesian Networks in Computerized Adaptive Tests Y1 - 2000 A1 - Millan, E. A1 - Trella, M A1 - Perez-de-la-Cruz, J.-L. A1 - Conejo, R CY - M. Ortega and J. Bravo (Eds.),Computers and Education in the 21st Century. Kluwer, pp. 217228. ER - TY - CONF T1 - Automated flawed item detection and graphical item used in on-line calibration of CAT-ASVAB. T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Krass, I. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National council on Measurement in Education Y1 - 1999 A1 - Fan, M. A1 - Thompson, T. A1 - Davey, T. JF - Paper presented at the annual meeting of the National council on Measurement in Education CY - Montreal N1 - #FA99-01 ER - TY - CHAP T1 - The development of a computerized adaptive selection system for computer programmers in a financial services company Y1 - 1999 A1 - Zickar, M.. J. A1 - Overton, R. C. A1 - Taylor, L. R. A1 - Harms, H. J. CY - F. Drasgow and J. B. Olsen (Eds.), Innvoations in computerized assessment (p. 7-33). Mahwah NJ Erlbaum. ER - TY - CONF T1 - Implications from information functions and standard errors for determining preferred normed scales for CAT and P and P ASVAB T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Nicewander, W. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - CONF T1 - Pretesting alongside an operational CAT T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1999 A1 - Davey, T. A1 - Pommerich, M A1 - Thompson, D. T. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Montreal, Canada ER - TY - JOUR T1 - Some reliability estimates for computerized adaptive tests JF - Applied Psychological Measurement Y1 - 1999 A1 - Nicewander, W. A. A1 - Thomasson, G. L. AB - Three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θin computerized adaptive tests (CAT). Each reliability estimate is a function of test information. Two of the estimates are shown to be upper bounds to true reliability. The three reliability estimates and the true reliabilities of both MLE and BME were computed for seven simulated CATs. Results showed that the true reliabilities for MLE and BME were nearly identical in all seven tests. The three reliability estimates never differed from the true reliabilities by more than .02 (.01 in most cases). A simple implementation of one reliability estimate was found to accurately estimate reliability in CATs. VL - 23 ER - TY - CONF T1 - CAT item calibration T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Hsu, Y. A1 - Thompson, T.D. A1 - Chen, W-H. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CONF T1 - CAT Item exposure control: New evaluation tools, alternate methods and integration into a total CAT program T2 - Paper presented at the annual meeting of the National Council of Measurement in Education Y1 - 1998 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council of Measurement in Education CY - San Diego, CA ER - TY - ABST T1 - A comparative study of item exposure control methods in computerized adaptive testing Y1 - 1998 A1 - Chang, S-W. A1 - Twu, B.-Y. CY - Research Report Series 98-3, Iowa City: American College Testing. N1 - #CH98-03 ER - TY - CONF T1 - A comparison of two methods of controlling item exposure in computerized adaptive testing T2 - Paper presented at the meeting of the American Educational Research Association. San Diego CA. Y1 - 1998 A1 - Tang, L. A1 - Jiang, H. A1 - Chang, Hua-Hua JF - Paper presented at the meeting of the American Educational Research Association. San Diego CA. ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - San Diego ER - TY - CONF T1 - Constructing passage-based tests that parallel conventional programs T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Thompson, T. A1 - Davey, T. A1 - Nering, M. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - A hybrid method for controlling item exposure in computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nering, M. L. A1 - Davey, T. A1 - Thompson, T. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - ABST T1 - The relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08) Y1 - 1998 A1 - Taylor, C. A1 - Jamieson, J. A1 - Eignor, D. R. A1 - Kirsch, I. CY - Princeton NJ: Educational Testing Service ER - TY - CONF T1 - Some item response theory to provide scale scores based on linear combinations of testlet scores, for computerized adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Thissen, D. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - CONF T1 - Some reliability estimators for computerized adaptive tests T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1998 A1 - Nicewander, W. A. A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Urbana, IL ER - TY - JOUR T1 - Adapting to adaptive testing JF - Personnel Psychology Y1 - 1997 A1 - Overton, R. C. A1 - Harms, H. J. A1 - Taylor, L. R. A1 - Zickar, M.. J. VL - 50 ER - TY - JOUR T1 - Diagnostic adaptive testing: Effects of remedial instruction as empirical validation JF - Journal of Educational Measurement Y1 - 1997 A1 - Tatsuoka, K. K. A1 - Tatsuoka, M. M. VL - 34 ER - TY - CONF T1 - The goal of equity within and between computerized adaptive tests and paper and pencil forms. T2 - Paper presented at the annual meeting of the National Council on Measurement in Education Y1 - 1997 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the National Council on Measurement in Education CY - Chicago IL ER - TY - CONF T1 - Identifying similar item content clusters on multiple test forms T2 - Paper presented at the Psychometric Society meeting Y1 - 1997 A1 - Reckase, M. D. A1 - Thompson, T.D. A1 - Nering, M. JF - Paper presented at the Psychometric Society meeting CY - Gatlinburg, TN, June ER - TY - CONF T1 - Realistic simulation procedures for item response data T2 - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society Y1 - 1997 A1 - Davey, T. A1 - Nering, M. A1 - Thompson, T. JF - In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society CY - Gatlinburg TN N1 - Symposium presented at the annual meeting of the Psychometric Society, Gatlinburg TN. ER - TY - CONF T1 - Simulation of realistic ability vectors T2 - Paper presented at the Psychometric Society meeting Y1 - 1997 A1 - Nering, M. A1 - Thompson, T.D. A1 - Davey, T. JF - Paper presented at the Psychometric Society meeting CY - Gatlinburg TN ER - TY - CONF T1 - A comparison of the traditional maximum information method and the global information method in CAT item selection T2 - annual meeting of the National Council on Measurement in Education Y1 - 1996 A1 - Tang, K. L. KW - computerized adaptive testing KW - item selection JF - annual meeting of the National Council on Measurement in Education CY - New York, NY USA ER - TY - CONF T1 - Constructing adaptive tests to parallel conventional programs T2 - Paper presented at the annual meeting of the American Educational Research Association Y1 - 1996 A1 - Davey, T. A1 - Thomas, L. JF - Paper presented at the annual meeting of the American Educational Research Association CY - New York ER - TY - JOUR T1 - Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests JF - Journal of Educational Measurement Y1 - 1995 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. VL - 32 ER - TY - CONF T1 - New item exposure control algorithms for computerized adaptive testing T2 - Paper presented at the annual meeting of the Psychometric Society Y1 - 1995 A1 - Thomasson, G. L. JF - Paper presented at the annual meeting of the Psychometric Society CY - Minneapolis MN ER - TY - ABST T1 - Using simulation to select an adaptive testing strategy: An item bank evaluation program Y1 - 1995 A1 - Hsu, T. C. A1 - Tseng, F. L. CY - Unpublished manuscript, University of Pittsburgh ER - TY - ABST T1 - DIF analysis for pretest items in computer-adaptive testing (Educational Testing Service Research Rep No RR 94-33) Y1 - 1994 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. CY - Princeton NJ: Educational Testing Service. N1 - #ZW94-33 ER - TY - JOUR T1 - A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests JF - Applied Psychological Measurement Y1 - 1994 A1 - Zwick, R. A1 - Thayer, D. T. A1 - Wingersky, M. VL - 18 ER - TY - ABST T1 - A simulation study of methods for assessing differential item functioning in computer-adaptive tests (Educational Testing Service Research Rep No RR 93-11) Y1 - 1993 A1 - Zwick, R. A1 - Thayer, D. A1 - Wingersky, M. CY - Princeton NJ: Educational Testing Service. ER - TY - BOOK T1 - A comparison of methods for adaptive estimation of a multidimensional trait Y1 - 1992 A1 - Tam, S. S. CY - Unpublished doctoral dissertation, Columbia University ER - TY - JOUR T1 - The development and evaluation of a system for computerized adaptive testing JF - Dissertation Abstracts International Y1 - 1992 A1 - de la Torre Sanchez, R. KW - computerized adaptive testing VL - 52 ER - TY - ABST T1 - Construction and validation of the SON-R 5-17, the Snijders-Oomen non-verbal intelligence test Y1 - 1991 A1 - Laros, J. A. A1 - Tellegen, P. J. CY - Groningen: Wolters-Noordhoff ER - TY - JOUR T1 - On the reliability of testlet-based tests JF - Journal of Educational Measurement Y1 - 1991 A1 - Sireci, S. G. A1 - Wainer, H., A1 - Thissen, D. VL - 28 ER - TY - BOOK T1 - Computerized adaptive testing: A primer (Eds.) Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Flaugher, R. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - Hillsdale NJ: Erlbaum ER - TY - CHAP T1 - Creating adaptive tests of musical ability with limited-size item pools Y1 - 1990 A1 - Vispoel, W. T. A1 - Twing, J. S CY - D. Dalton (Ed.), ADCIS 32nd International Conference Proceedings (pp. 105-112). Columbus OH: Association for the Development of Computer-Based Instructional Systems. ER - TY - CHAP T1 - Future challenges Y1 - 1990 A1 - Wainer, H., A1 - Dorans, N. J. A1 - Green, B. F. A1 - Mislevy, R. J. A1 - Steinberg, L. A1 - Thissen, D. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. ER - TY - CHAP T1 - Reliability and measurement precision Y1 - 1990 A1 - Thissen, D. CY - H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, and D. Thissen (Eds.), Computerized adaptive testing: A primer (pp. 161-186). Hillsdale NJ: Erlbaum. ER - TY - JOUR T1 - Sequential item response models with an ordered response JF - British Journal of Mathematical and Statistical Psychology Y1 - 1990 A1 - Tutz, G. VL - 43 ER - TY - CHAP T1 - Testing algorithms Y1 - 1990 A1 - Thissen, D. A1 - Mislevy, R. J. CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103-135). Hillsdale NJ: Erlbaum. ER - TY - ABST T1 - Utility of predicting starting abilities in sequential computer-based adaptive tests (Research Report 90-1) Y1 - 1990 A1 - Green, B. F. A1 - Thomas, T. J. CY - Baltimore MD: Johns Hopkins University, Department of Psychology ER - TY - CHAP T1 - Validity Y1 - 1990 A1 - Steinberg, L. A1 - Thissen, D. A1 - Wainer, H., CY - H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 187-231). Hillsdale NJ: Erlbaum. ER - TY - ABST T1 - Item-presentation controls for computerized adaptive testing: Content-balancing versus min-CAT (Research Report 89-1) Y1 - 1989 A1 - Thomas, T. J. A1 - Green, B. F. CY - Baltimore MD: Johns Hopkins University, Department of Psychology, Psychometric Laboratory ER - TY - JOUR T1 - Trace lines for testlets: A use of multiple-categorical-response models JF - Journal of Educational Measurement Y1 - 1989 A1 - Thissen, D. A1 - Steinberg, L. A1 - Mooney, J.A. VL - 26 ER - TY - CHAP T1 - A cognitive error diagnostic adaptive testing system Y1 - 1986 A1 - Tatsuoka, K. K. CY - the 28th ADCIS International Conference Proceedings. Washington DC: ADCIS. ER - TY - JOUR T1 - Some Applications of Optimization Algorithms in Test Design and Adaptive Testing JF - Applied Psychological Measurement Y1 - 1986 A1 - Theunissen, T. J. J. M. VL - 10 IS - 4 ER - TY - JOUR T1 - Some applications of optimization algorithms in test design and adaptive testing JF - Applied Psychological Measurement Y1 - 1986 A1 - Theunissen, T. J. J. M. VL - 10 ER - TY - JOUR T1 - Using microcomputer-based assessment in career counseling JF - Journal of Employment Counseling Y1 - 1986 A1 - Thompson, D. L. VL - 23 ER - TY - JOUR T1 - Latent structure and item sampling models for testing JF - Annual Review of Psychology Y1 - 1985 A1 - Traub, R. E. A1 - Lam, Y. R. VL - 36 ER - TY - ABST T1 - Adaptive testing (Final Report Contract OPM-29-80) Y1 - 1984 A1 - Trollip, S. R. CY - Urbana-Champaign IL: University of Illinois, Aviation Research Laboratory ER - TY - ABST T1 - Application of adaptive testing to a fraction test (Research Report 84-3-NIE) Y1 - 1984 A1 - Tatsuoka, K. K. A1 - Tatsuoka, M. M. A1 - Baillie, R. CY - Urbana IL: Univerity of Illinois, Computer-Based Education Research Laboratory ER - TY - CHAP T1 - The person response curve: Fit of individuals to item response theory models Y1 - 1983 A1 - Trabin, T. E. A1 - Weiss, D. J. CY - D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 83-108). New York: Academic Press. ER - TY - BOOK T1 - The stochastic modeling of elementary psychological processes Y1 - 1983 A1 - Townsend, J. T. A1 - Ashby, G. F. CY - Cambridge: Cambridge University Press ER - TY - ABST T1 - An adaptive Private Pilot Certification Exam Y1 - 1982 A1 - Trollip, S. R. A1 - Anderson, R. I. CY - Aviation, Space, and Environmental Medicine ER - TY - ABST T1 - Criterion-related validity of adaptive testing strategies (Research Report 80-3) Y1 - 1980 A1 - Thompson, J. G. A1 - Weiss, D. J. CY - Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory N1 - #TH80-03 {PDF file, 2.708 MB} ER - TY - ABST T1 - The danger of relying solely on diagnostic adaptive testing when prior and subsequent instructional methods are different (CERL Report E-5) Y1 - 1979 A1 - Tatsuoka, K. A1 - Birenbaum, M. CY - Urbana IL: Univeristy of Illinois, Computer-Based Education Research Laboratory. N1 - #TA79-01 ER - TY - JOUR T1 - Sequential testing for instructional classification JF - Journal of Computer-Based Instruction. Y1 - 1975 A1 - Thomas, D. B. VL - 1 ER - TY - ABST T1 - Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training Y1 - 1974 A1 - Hansen, D. N. A1 - Johnson, B. F. A1 - Fagan, R. L. A1 - Tan, P. A1 - Dick, W. CY - JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. ER - TY - BOOK T1 - A multivariate experimental study of three computerized adaptive testing models for the measurement of attitude toward teaching effectiveness Y1 - 1973 A1 - Tam, P. T.-K. CY - Unpublished doctoral dissertation, Florida State University ER - TY - JOUR T1 - Adaptive testing in an older population JF - Journal of Psychology Y1 - 1965 A1 - Greenwood, D. I. A1 - Taylor, C. VL - 60 ER - TY - ABST T1 - Construction of an experimental sequential item test (Research Memorandum 60-1) Y1 - 1960 A1 - Bayroff, A. G. A1 - Thomas, J. J A1 - Anderson, A. A. CY - Washington DC: Personnel Research Branch, Department of the Army ER -