Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Impact of Homogeneity of Answer Choices on Item Difficulty and Discrimination:

The Impact of Homogeneity of Answer Choices on Item Difficulty and Discrimination: This study explored the impact of homogeneity of answer choices on item difficulty and discrimination. Twenty-two matched pairs of elementary and secondary mathematics items were administered to randomly equivalent samples of students. Each item pair comparison was treated as a separate study with the set of effect sizes analyzed using meta-analysis and a moderator analysis. The results show that multiple-choice (MC) items with homogeneous answer choices tend to be easier than MC items with nonhomogeneous answer choices, but the magnitude was related to item content (algebra vs. geometry) and answer choice construction strategy. For algebra items, items with homogeneous answer choices are easier than those with nonhomogeneous answer choices. However, the difficulty of geometry items with homogeneous and nonhomogeneous is not statistically different. Taking into account answer choice construction strategy, the findings showed that items with homogeneous answer choices were easier than items with nonhomogeneous answer choices when different strategy was applied. However, the same construction strategy was applied; thus, the difficulty of items with homogeneous answer choices and nonhomogeneous answer choices was not statistically different. In addition, we found that item discrimination does not significantly change across MC items with homogeneous and nonhomogeneous answer choices. Keywords multiple-choice item, item-writing guidelines, homogeneity of answer choices, test validity, meta-analysis However, application of one guideline can lead to a Introduction violation of another. For instance, answer choices based If construct-irrelevant artifacts of the test-development pro- on common student errors may include choices that are cess interfere with the validity of inferences made from test not consistent with one another. In other words, construct- scores, then the entire testing enterprise is at risk. However, ing an MC item with plausible answer choices may harm relatively few empirical studies have examined the impact of their homogeneity in content and grammatical structure. item-writing guidelines on test performance. Specifically, answer choices for math items seems homo- Despite recent enthusiasm for technology-enhanced items, geneous if similar number types, number of digits, or test construction with multiple-choice (MC) items remains operations are used among answer options. Table 1 shows popular due to short administration times, scoring objectivity, two examples of math items with plausible but not homo- and the ability to make sound decisions with test results geneous answer choices. Choice D of Item 1 includes (Haladyna, Downing, & Rodriguez, 2002; McCoubrie, 2004). words, while the other answer choices contain only equa- However, creating an MC item can be challenging. Previous tions. Choice A of Item 2 is the only integer number, while studies have suggested that forming answer choices is a dif- the other choices are fractions. ficult part of the item-construction process (Haladyna & Writing items under the constraints of these three guidelines Downing, 1989; Hansen & Dexter, 1997), because each indi- requires more effort and time. However, the impact of homoge- vidual answer choice can potentially influence item quality. neity of answer choices on item difficulty (proportion of correct Haladyna and his colleagues (2002) proposed 31 item- writing guidelines for creating high-quality MC items that 1 Kahramanmaras Sutcu Imam University, Turkey contribute to overall test reliability and validity. The authors University of Kansas, Lawrence, USA examined three commonly cited guidelines related to con- Corresponding Author: structing answer choices: No.23 “Keep choices homoge- Erkan Hasan Atalmis, Kahramanmaras Sutcu Imam University, Avsar neous,” No.29 “Make all distracters plausible,” and No.30 Campus, Kahramanmaras 46100, Turkey. “Use typical errors of students.” Email: eatalmis@ksu.edu.tr Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open Table 1. Items with Plausible Answer Choices That Are Not Homogeneous. Item 1 Item 2 1 2 3 4 Which is true between and ? Which is × ? 2 3 2 3 1 2 A. A. 2 2 3 B. 1 2 B. < 2 3 C. 1 2 C. = 5 2 3 D. 1 2 D. and cannot be compared 2 3 •• Common errors of students •• Common errors of students •• Plausible distractors •• Plausible distractors •• Answer choices are not homogeneous •• Answer choices are not homogeneous (Choice D is not parallel to other answer choices) (Choice A is not parallel to other answer choices) answer) and discrimination (correlation with the total test score) Unlike previous studies that explored the impact of remains uncertain. The current study addresses this question. answer choice homogeneity on item difficulty for the items Haladyna and Rodriguez (2013) suggested that answer with word-based answer choices, the current study examines choices that are nonhomogeneous in content and grammar the impact of answer choice homogeneity on item difficulty can provide a cue to students about the correct answer. Cues and item discrimination of mathematics items with numeri- can make items easier and influence item discrimination cal answer choices. More specifically, we examine three indices because students take advantage of such cues to research questions: choose correct answers. Answer choice homogeneity is defined in different ways for Research Question 1: Are item difficulty and item dis- MC items based on written statements and MC items based on crimination for items with homogeneous answer choices mathematical expressions. For example, language similarity, statistically different from item difficulty for items with content, and grammatical consistency can make word-based nonhomogeneous answer choices? answer choices homogeneous. On the contrary, answer choices Research Question 2: Are item difficulty and item dis- in mathematics items can be made homogeneous by using a con- crimination for items with homogeneous answer choices sistent number of digits (three vs. two), type of number (integer statistically different from item difficulty for items with vs. fraction), and format of answer choices (words vs. numbers). nonhomogeneous answer choices when controlling item Although these differences may seem small, they may be a content (algebra vs. geometry)? source of construct-irrelevant variance, caused, perhaps, by test Research Question 3: Are item difficulty and item dis- wiseness or alternatively a student’s suspicion that the item crimination for items with homogeneous answer choices writer is trying to be tricky. This study examines the impact of statistically different from item difficulty for items with homogeneity of answer choices only for mathematics MC items. nonhomogeneous answer choices when controlling The few empirical test-development studies that have answer choice construction strategy? focused on answer choice similarity have had mixed results. Smith and Smith (1988) tested the impact of the similarity Method between correct choice (key) and incorrect student response for reading test items using different standard-setting methods, In this study, we replicated previous studies that compared such as Angoff and Nedelsky methods. They found that answer the difficulty and discrimination of items with homogeneous choice similarity makes items more difficult using both meth- answer choices and items with nonhomogeneous answer ods. However, they did not adequately define how to classify choices for mathematics items. We have also analyzed the answer choice similarity. Ascalon, Meyers, Davis, and Smits interaction of these factors with item content and answer (2007) found parallel results. They focused on only MC items choices construction strategy. on a driver’s license examination and classified answer choice similarity by comparing all distractors in an item with the cor- Test Items rect answer based on distractor content, such as theme, words, and sentence length. On the contrary, Green (1984) found the We selected 22 pairs of MC mathematics items from a state opposite result using general information items. test forms in the United States developed for Grades 3 to 8 Atalmis and Kingston 3 Table 2. Items with Homogeneous and Nonhomogeneous Answer Choices. Homogeneous Nonhomogeneous Item Pair 1 (algebra) What is 2,508 ÷ 4? What is 1,620 ÷ 9? A. 102 A. 68 B. 127 B. 102 C. 602 C. 120 D. 627 D. 180 Item Pair 2 (algebra) Solve for x: Solve for x: (x + 5) = 15 (x – 2) = 6 A. x = 7 A. x = 6 B. x = 6 B. x = 8 C. x = 9 C. x = 8 D. x = 10 D. x = 10 Item Pair 3 (geometry) Millie plans to paint a picture on a rectangular Tasha is covering a rectangular bulletin board piece of paper. She has a piece of paper with paper. The bulletin board is 14 feet (ft) that measures 13 inches (in) by 17 in. long and 4 ft wide. Exactly how much paper Exactly how many square inches (in ) of does Tasha need to completely cover the paper does Millie have? bulletin board? 2 2 A. 30 in A. 18 ft 2 2 B. 60 in B. 36 ft 2 2 C. 221 in C. 56 ft 2 2 D. 442 in D. 112 ft and high school. We selected the pairs by identifying items For Item Pair 2, both items measure the same specific that were based on the same specific learning standard, learning standard, which is “multiply each term inside the which is educational objectives students should possess at brackets (both algebraic term and number) by fraction out- critical point of any course, within algebra or geometry and side the brackets.” The answer choices of the item on the left that had either homogeneous or nonhomogeneous answer are homogeneous because two of the choices are integers and choices. The coding of homogeneity and nonhomogeneity the other two choices are mixed fractions; integers and mixed was determined by three judges who are researchers experi- fractions are equally represented in the answer choices. On enced at item-writing based on state standards and taxonomy. the contrary, the answer choices of the item on the right side Each MC item had four answer choices: the key and three are nonhomogeneous because Choice D is a whole integer, distractors. Although the items in each pair were written to while the other choices contain fractions. the same specific learning standards and the stems were set For Item Pair 3, both items measure the same specific up similarly, the item content was not identical as can be seen learning standard, which is “calculate the area of rectangle in the item pair examples in Table 2. in real word problems.” The answer choices of the item on For Item Pair 1, the items on the right and the left side the left are homogeneous because two of the choices con- measure the same specific learning standard, which is “divid- tain two digits, whereas the other two choices contain three ing a four-digit numbers by two-digit number.” The answer digits. The answer choices have two equal sets of similar choices of the item on the left are homogeneous because all choices in terms of number of digits. On the contrary, the choices have three-digit numbers. Moreover, the final digits answer choices of the item on the right side are nonhomo- in the choices are either 02 or 27. This also allows readers geneous because Choice D contains three digits, while the make a set of two similar choices: (102, 127) and (602, 627). other choices contain two digits. It does not harm dissimilarity among answer choices because After we selected the item pairs, three judges determined only one choice which is exactly different from others is not whether answer choices in item pairs were constructed using selected by take takers during the exam to provide cue. On “the same” or “different” strategies. Considering item pairs the contrary, the answer choices of the item on the right are in Table 1, Item Pair 3 was coded as “using the same strat- nonhomogeneous because Choice A contains a different egy” because each answer choice of items in Item Pair 3 was number of digits than the other answer choices. It means that constructed using the same strategy. It means that Choice A Choice A is exactly different from other in a particular way stems from adding length and width, Choice B from the for- and student might tend to select this option as cue. This influ- mula for perimeter, Choice C from area, and Choice D from ences psychometric properties of items. area times 2. 4 SAGE Open However, Item Pair 1 was coded as “using different strat- selected from those participating in a state accountability egy” because not all answer choices of items in Item Pair 1 assessment. Field test items were embedded in operational were constructed using the same strategy. For example, test forms and thus students were not aware that these items Choice B of homogeneous item (the left side) and Choice A were field test items and did not count toward their scores. of nonhomogeneous item (the right side) stem from dividing To check the similarity of the random samples of students the last three digits of dividend by divisor while Choice C of receiving each item within a pair, we took all of the demo- homogeneous item and Choice B of nonhomogeneous item graphic characteristics of the samples into account. For every stem from the combination of dividing the first two digits, one of the 22 item pairs, the samples of students differed by the third digit, and the last digit of dividend by divisor. no more than .01 in the proportion of disability of students, However, Choice A of homogeneous item stems from divid- .02 in the proportion of male students, .01 in the proportion ing each digit of dividend by divisor while Choice C of non- of students from any racial group, and .01 in the proportion homogeneous item stem from the combination of dividing of students qualified for a free or reduced-price lunch. In the first two digits of dividend by divisor and last two digits addition, the difficulty of each test form was the same of dividend. because mean total scores of the test forms were not more Item Pair 2 also was coded as “using different strategy” than .8, whereas standard deviation in the mean total scores because not all answer choices of items in Item Pair 2 were was not more than .06. Table 3 shows item characteristics of constructed using the same strategy. The items are related to each pair. the distributive property of multiplication over addition/sub- traction. Only Choice C of homogeneous item and nonhomo- Meta-Analysis geneous item was constructed using the same strategy, which stems from multiplying the first term inside the brackets by Meta-analysis is a statistical analysis integrating and sum- fraction outside the brackets and adding/subtracting the sec- marizing the results of individual quantitative studies on a ond term inside the brackets and/from the fraction. For other particular topic (Glass, 1976). Meta-analysis allows research- answer choices of homogeneous and nonhomogeneous items, ers to compute effect sizes to combine mean and standard they were constructed using different strategy. For homoge- deviation values, p values, or correlation coefficients pro- neous item, Choice A stems from making sign error after mul- posed by studies (Kulik & Kulik, 1989; Sen & Akbas, 2016). tiplying each term inside the brackets by the function. Choice Moreover, moderator analysis can be integrated into meta- B of homogeneous item stems from multiplying the number analysis for more precise estimate after studies are grouped on right side of equation by the function rather than multiply- based on moderator variables, such as age, gender, and sub- ing each term inside the brackets by the function. For nonho- ject area (Alpaslan, Yalvac, & Willson, 2017). mogeneous item, Choice B stems from multiplying each term There are two approaches to compute effect size in meta- inside the brackets. Choice D stems from equating the terms analysis: fixed-effects model and random-effects model. inside brackets to the number on right side of equation. Under the fixed model, the same effect size is calculated for After answer choices construction strategy of all item pairs all studies and weighted based on the number of observations was examined, we calculated the item difficulty and discrimi- by that study (Borenstein, Hedges, Higgins, & Rothstein, nation for each item within the pairs by using classical test 2012). Under random-effects model, effect size varies from theory (CTT) approach. Item difficulty is calculated as the study to study due to demographic characteristics of sam- proportion of examinees answering the item correctly while ples, such as differences in education level, age, and socio- item discrimination refers to the ability of an item to discrimi- economic status (Cooper, 2010). nate between students with high scores and low scores (Thorndike, 2005). We applied item-total correlation index to Item difficulty. Each item pair comparison was treated as a calculate item discrimination for each item in this study, separate study in a random-effects meta-analysis. Thus, the which is one of most widely used method (Downing, 2005). difference in sample characteristics and item content across Figure 1 shows item difficulty and item discrimination item pairs was accounted for by the random-effects error index values for each item pairs. A total of seven item pairs term. The software package Comprehensive Meta-Analysis had geometry content, whereas 14 item pairs had algebra was used to perform the analyses (Borenstein, Hedges, Hig- content. In terms of answer options construction strategy, gins, & Rothstein, 2005). Table 4 presents the results of five item pairs were coded as “using the same strategy” while fixed- and random-effects models, which are based on differ- 17 item pairs were coded as “using the different strategy” ent assumptions. True effect was the same across all item (see Figure 1). pairs in the fixed-effects model, whereas it varied from one item pair to another in the random-effects model (Borenstein et al., 2012). Participants The variation between item pairs, heterogeneity, was statis- The items in each pair were administered to randomly equiv- tically significant, Q(21) = 3,420.99, p < .001. The correspond- alent, overlapping, and nonoverlapping samples of students ing I was 99.39, which means that 99% of observed variance Atalmis and Kingston 5 Figure 1. Item difficulty and discrimination for 22 item pairs. in item pair effect sizes came from differences between item According to Table 5, algebra items with homogeneous pairs that were not explained by sampling variability. The point answer choices were easier than algebra items with nonho- estimates of the fixed-effects model and random-effects model mogeneous answer choices, with an average effect size of in Table 4 show the difference in item difficulty between the .17, which is statistically significant (z = 3.75, p < .001). items with homogeneous answer choices and the items with However, the difficulty of geometry items with homoge- nonhomogeneous answer choices. Items with homogeneous neous answer choices was not statistically different from the answer choices were easier than items with nonhomogeneous difficulty of geometry items with nonhomogeneous answer answer choices with average effect sizes of .12 in fixed-effects choices (z = –0.31, p = .756). model and .11 in random-effects model. The 95% confidence For items whose answer choices were constructed using interval ranged from .11 to .13 in fixed-effects model and .02 to different strategy in Table 5, items with homogeneous .20 in random-effects model. answer choices were easier than items with nonhomoge- We also conducted a mixed effects moderator analysis for neous answer choices, with an average effect size of .12, the content area of the items (algebra vs. geometry) and which is statistically significant (z = 2.46, p < .05). However, answer choice construction strategy, as shown in Table 5. for items whose answer choices constructed using the same 6 SAGE Open Table 3. Item Characteristics of Each Pair. Item discrimination. After we transformed the item discrim- ination indices for each of the 44 items from item-total Item characteristics correlations to Fisher z values, we generated fixed- and random-effects models for item discrimination values by # of Item Item # of pair Item type sample difficulty discrimination using the software package Comprehensive Meta-Analysis (Borenstein et al., 2005). Table 6 shows the results of the 1 Homogeneous 5,282 0.69 0.36 fixed- and random-effects models. Nonhomogeneous 5,266 0.62 0.32 The value of I was 98.97, which means that 99% of the 2 Homogeneous 5,274 0.72 0.39 observed variance in item pair effect sizes came from differ- Nonhomogeneous 5,266 0.62 0.32 ences between item pairs that were not explained by sampling 3 Homogeneous 3,233 0.53 0.41 variability. The variation between item pairs, heterogeneity, Nonhomogeneous 3,205 0.42 0.41 was statistically significant, Q(21) = 2,033.11, p < .001. The 4 Homogeneous 3,216 0.29 0.24 point estimate of the fixed-effects model and random-effects Nonhomogeneous 3,217 0.24 0.25 model showed that the difference in item discrimination 5 Homogeneous 3,222 0.38 0.23 between the items with homogeneous answer choices and the Nonhomogeneous 3,195 0.42 0.37 items with nonhomogeneous answer choices was not statisti- 6 Homogeneous 1,726 0.47 0.33 cally different—fixed-effects model: t(21) = –1.17, p = .22; Nonhomogeneous 1,705 0.39 0.3 random-effects model: t(21) = –0.08, p = .93. 7 Homogeneous 6,309 0.72 0.41 Nonhomogeneous 6,262 0.63 0.43 8 Homogeneous 6,263 0.63 0.34 Results and Discussion Nonhomogeneous 6,263 0.53 0.41 The purpose of this study was to explore the impact of the 9 Homogeneous 6,532 0.82 0.47 precise homogeneity of answer choices on item difficulty and Nonhomogeneous 6,532 0.84 0.4 discrimination of mathematics items. The findings showed 10 Homogeneous 6,360 0.94 0.37 that, overall, items with homogeneous answer choices were Nonhomogeneous 6,360 0.92 0.4 easier than items with nonhomogeneous answer choices, but 11 Homogeneous 1,726 0.83 0.39 the result depended on item content (algebra vs. geometry) Nonhomogeneous 1,726 0.91 0.44 and answer choice construction strategy. Algebra items with 12 Homogeneous 12,153 0.7 0.54 homogeneous answer choices were easier than algebra items Nonhomogeneous 12,153 0.68 0.57 with nonhomogeneous answer choices. However, the differ- 13 Homogeneous 13,283 0.84 0.46 ence in difficulty of geometry items with homogeneous answer Nonhomogeneous 13,283 0.75 0.47 choices was not statistically significant than from the diffi- 14 Homogeneous 9,563 0.8 0.5 culty of geometry items with nonhomogeneous answer Nonhomogeneous 9,563 0.81 0.55 choices. Taking into consideration answer choice construction 15 Homogeneous 9,328 0.76 0.58 strategy, the findings showed that items with homogeneous Nonhomogeneous 9,328 0.64 0.54 answer choices were easier than items with nonhomogeneous 16 Homogeneous 9,328 0.98 0.16 answer choices when different strategy was used to construct Nonhomogeneous 9,328 0.97 0.15 answer choices of items with homogeneous and nonhomoge- 17 Homogeneous 10,553 0.68 0.51 neous answer choices. On the contrary, when the answer Nonhomogeneous 10,553 0.8 0.31 choices of items were constructed using the same construction 18 Homogeneous 9,853 0.67 0.36 strategy, the difficulty of items with homogeneous answer Nonhomogeneous 9,853 0.76 0.37 choices and nonhomogeneous answer choices was not statisti- 19 Homogeneous 10,802 0.91 0.43 cally different. Moreover, the very large I statistic indicates Nonhomogeneous 10,802 0.81 0.37 that even when considering algebra versus geometry items, the 20 Homogeneous 8,336 0.77 0.27 source of most of the variation in difficulty across homogene- Nonhomogeneous 8,336 0.48 0.37 ity conditions was undetermined. Also the impact of homoge- 21 Homogeneous 17,971 0.73 0.51 neity of answer choices on discrimination was not statistically Nonhomogeneous 17,971 0.67 0.6 significant. 22 Homogeneous 10,782 0.83 0.4 The results of this study provide empirical support to Nonhomogeneous 10,782 0.69 0.61 the growing body of test-development studies. Specifically, this study contributes to research on the impact of answer choice homogeneity on item psychometric characteristics. strategy, the difficulty of items with homogeneous answer First, unlike past studies that focused on word-based choices was not statistically different from the difficulty of items, the current study examined the impact of homoge- items with nonhomogeneous answer choices (z = 0.65, neity of answer choices on psychometric characteristics of p = .519). Atalmis and Kingston 7 Table 4. Fixed Effect and Random Effect of the Model for Item Difficulty. Effect size and 95% confidence interval Test of null Heterogeneity Model # of studies M Lower limit Upper limit z value p value Q df (Q) Significance level I Fixed 22 0.12 0.11 0.13 35.33 .00 3,420.99 21 .00 99.39 Random 22 0.11 0.02 0.20 2.51 .01 Table 5. Moderator Analysis for Content Area and Answer Choice Construction Strategy (Mixed-Effect Analysis). Effect size and 95% confidence interval n M Lower limit Upper limit z Significance level Content area Algebra 15 0.17 0.08 0.26 3.75 .000 Geometry 7 −0.02 −0.14 0.11 −0.31 .756 Total 22 0.11 0.03 0.18 2.86 .004 Answer choice construction strategy Different strategy 17 0.12 0.03 0.22 2.46 .014 The same strategy 5 0.06 −0.13 0.26 0.65 .519 Total 22 0.11 0.02 0.2 2.48 .013 Table 6. Fixed Effect and Random Effect of the Model for Item Discrimination. Effect size and 95% confidence interval Test of null Heterogeneity Model # of studies M Lower limit Upper limit z value p value Q value df (Q) Significance level I Fixed 22 0.00 −0.01 0.00 −1.17 .22 2,033.11 21.00 .00 98.97 Random 22 0.00 −0.07 0.06 −0.08 .93 mathematics items. With regard to effect in item diffi- conducted a meta-analysis to determine the impact of on culty, our overall findings are consistent with Green item difficulty and discrimination by examining 56 empir- (1984), yet inconsistent with Smith and Smith (1988) and ical studies, most of which contained social sciences and Ascalon et al. (2007), in that on average the items with language arts items which are word-based items. The find- homogeneous answer choices were easier. However, when ings showed that items become easier when number of answer choices were constructed using the same strategy, options decreased. However, a recent study conducted by it was found that the difficulty of items with homogeneous Atalmış and Kingston (2017) designed MC mathematics answer choices was not statistically different from the dif- items with four options and three options constructed ficulty of items with nonhomogeneous answer choices, using the same strategy and found that item difficulty was which is not consistent with the findings of Green (1984), not statistically different across the items with four options Smith and Smith (1988), and Ascalon et al. (2007). One and three options. These findings supported only if options hypothesis that might explain these results is that the of mathematics items were constructed using the same impact of option homogeneity depends on item content, strategy, item difficulty may not be changed by homoge- because the discrepant results came from studies using neity of answer choices or number of the options. reading test items, driver license items, and general infor- Another contribution of this study is that the current study mation items. The common characteristic of such word- examined the impact of answer choice homogeneity not only based items for examinees is that the probability of on item difficulty but also on item discrimination which dif- choosing correct answer using answer choices may be far fers from previous studies. The findings showed that item higher compared with mathematics items because exam- discrimination was not statistically influenced by mathemat- inees may choose correct answer for a mathematics item ics items with homogeneous answer choices and nonhomo- only if they know the solution regardless of answer geneous answer choices even if different item contents and choices. Existing literature related to number of options answer choice construction strategies were applied. A on item psychometric properties may be considered to hypothesis that might explain these results is that items with support this claim. For example, Rodriguez (2005) homogeneous answer choices and nonhomogeneous answer 8 SAGE Open were constructed applying the same specific learning stan- References dards and the stems. Thus, when parallel mathematics item Alpaslan, M. M., Yalvac, B., & Willson, V. (2017). A meta analyti- pairs are constructed, learning standards and stem similarity cal review of the relationship between personal epistemology may be considered as higher priority than item content and and self-regulated learning. Turkish Journal of Education, 6, answer choose construction strategy to keep item discrimina- 48-67. Ascalon, M. E., Meyers, L. S., Davis, B. W., & Smits, N. (2007). tion similarity between parallel items. Distractor similarity and item-stem structure: Effects on item The last contribution of this study is the use of meta-anal- difficulty. Applied Measurement in Education, 20, 153-170. ysis in unconventional contexts. Although meta-analysis is doi:10.1080/08957340701301272 usually used for integrating and summarizing the results of Atalmış, E. H., & Kingston, N. M. (2017). Three, four, and none of individual quantitative studies on a particular topic, in this the above options in multiple-choice items. Turkish Journal of study, each item pair comparison was considered as a sepa- Education, 6, 142-157. doi:10.19128/turje.333687 rate study to examine the impact of answer option homoge- Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, neity on item psychometric properties. H. (2005). Comprehensive meta-analysis (Version 2.2.048) The results of this empirical study also apply to item [Software]. Englewood, NJ: Biostat. writers and classroom teachers. Writing MC items with Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. plausible distractors based on common student errors is (2012). Introduction to meta-analysis. Chichester, UK: John Wiley. always suggested (Haladyna et al., 2002). However, creat- Cooper, H. (2010). Research synthesis and meta-analysis. Thousand ing plausible distractors may affect answer choice homoge- Oaks, CA: SAGE. neity. Although each approach has merits, creating items Downing, S. M. (2005). The effects of violating standard item writ- with both plausible and homogeneous answer choices poses ing principles on tests and students: The consequences of using a challenge for item writers and classroom teachers. That is, flawed test items on achievement examinations in medical item writers might need to spend significant extra time to education. Advances in Health Sciences Education, 10(2), 133- construct fourth or fifth option which is plausible and 143. doi:10.1007/s10459-004-4019-5 homogeneous, and thus construct fewer items in a given Glass, G. V. (1976). Primary, secondary and meta-analysis of amount of time. The results of the current study are infor- research. Educational Researcher, 5(10), 3-8. mative for test creators on the usage of such approaches Green, K. (1984). Effects of item characteristics on multiple-choice while designing a test. item difficulty. Educational and Psychological Measurement, 44, 551-561. doi:10.1177/0013164484443002 Given the results of this study combines with the existing Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of literature on item development, we recommend that the use multiple-choice item-writing rules. Applied Measurement in of plausible answer options be given a higher priority than Education, 2, 37-50. doi:10.1207/s15324818ame0201_3 the use of homogeneous answer options in constructing MC Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A mathematics items, but that both guidelines be considered. review of multiple-choice item-writing guidelines for class- There are some limitations of this study. First, there were room assessment. Applied Measurement in Education, 15, 309- only 22 pairs of items; thus, the number of items per grade 334. doi:10.1207/S15324818AME1503_5 level and the variety of types of nonhomogeneity were lim- Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and vali- ited. Second, this study was conducted for only mathematics dating test items. New York, NY: Routledge. items. Most importantly, most of the variation in difficulty Hansen, J. D., & Dexter, L. (1997). Quality multiple-choice test differences between the items with homogeneous and non- questions: Item-writing guidelines and an analysis of audit- ing test banks. The Journal of Education for Business, 73, homogeneous answer choices was not explained. We recom- 94-97. mend additional exploration of item content and answer Kulik, J. A., & Kulik, C. L. C. (1989). The concept of meta-analysis. options construction strategy as moderator variables. International Journal of Educational Research, 13, 227-340. McCoubrie, P. (2004). Improving the fairness of multiple-choice Authors’ Note questions: A literature review. Medical Teacher, 26, 709-712. An earlier version of this article was presented at the National doi:10.1080/01421590400013495 Council on Measurement in Education (NCME) in Philadelphia, Rodriguez, M. C. (2005). Three options are optimal for multi- PA, USA, in 2014. ple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13. Declaration of Conflicting Interests doi:10.1111/j.1745-3992.2005.00006.x Sen, S., & Akbas, N. (2016). A study on multilevel meta-analysis The author(s) declared no potential conflicts of interest with respect methods. Journal of Measurement and Evaluation in Education to the research, authorship, and/or publication of this article. and Psychology, 7(1), 1-17. Smith, R. L., & Smith, J. K. (1988). Differential use of item Funding information by judges using Angoff and Nedelsky proce- The author(s) received no financial support for the research, author- dures. Journal of Educational Measurement, 25, 259-274. ship, and/or publication of this article. doi:10.1111/j.1745-3984.1988.tb00307.x Atalmis and Kingston 9 Thorndike, R. M. (2005). Measurement and evaluation in psychol- Neal Kingston, PhD, came to the University of Kansas in 2006 ogy and education (7th ed.). Upper Saddle River, NJ: Pearson. and is a professor in the Research, Evaluation, Measurement, and Statistics Program and director of the Achievement and Assessment Institute. His research focuses on large-scale assess- Author Biographies ment, with particular emphasis on how it can better support stu- Erkan Hasan Atalmis is an assistant professor at the dent learning. He is the principal investigator/director or co-prin- Kahramanmaras Sutcu Imam University in Turkey. He earned cipal investigator of several large research projects, including PhD degree in Research, Evaluation, Measurement, and Statistics Design and Development of a Dynamic Learning Maps Alternate Program from University of Kansas. His research interest includes Assessment, Kansas Assessment Program, Career Pathways item and test development, type of grading system in educational Assessment System, and Development and Validation of Online assessment, and supplementary education. Adaptive Reading Motivation Measures. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png SAGE Open SAGE

The Impact of Homogeneity of Answer Choices on Item Difficulty and Discrimination:

SAGE Open , Volume 8 (1): 1 – Feb 14, 2018

Loading next page...
 
/lp/sage/the-impact-of-homogeneity-of-answer-choices-on-item-difficulty-and-Gxxs7AUi1l
Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Inc, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
2158-2440
eISSN
2158-2440
DOI
10.1177/2158244018758147
Publisher site
See Article on Publisher Site

Abstract

This study explored the impact of homogeneity of answer choices on item difficulty and discrimination. Twenty-two matched pairs of elementary and secondary mathematics items were administered to randomly equivalent samples of students. Each item pair comparison was treated as a separate study with the set of effect sizes analyzed using meta-analysis and a moderator analysis. The results show that multiple-choice (MC) items with homogeneous answer choices tend to be easier than MC items with nonhomogeneous answer choices, but the magnitude was related to item content (algebra vs. geometry) and answer choice construction strategy. For algebra items, items with homogeneous answer choices are easier than those with nonhomogeneous answer choices. However, the difficulty of geometry items with homogeneous and nonhomogeneous is not statistically different. Taking into account answer choice construction strategy, the findings showed that items with homogeneous answer choices were easier than items with nonhomogeneous answer choices when different strategy was applied. However, the same construction strategy was applied; thus, the difficulty of items with homogeneous answer choices and nonhomogeneous answer choices was not statistically different. In addition, we found that item discrimination does not significantly change across MC items with homogeneous and nonhomogeneous answer choices. Keywords multiple-choice item, item-writing guidelines, homogeneity of answer choices, test validity, meta-analysis However, application of one guideline can lead to a Introduction violation of another. For instance, answer choices based If construct-irrelevant artifacts of the test-development pro- on common student errors may include choices that are cess interfere with the validity of inferences made from test not consistent with one another. In other words, construct- scores, then the entire testing enterprise is at risk. However, ing an MC item with plausible answer choices may harm relatively few empirical studies have examined the impact of their homogeneity in content and grammatical structure. item-writing guidelines on test performance. Specifically, answer choices for math items seems homo- Despite recent enthusiasm for technology-enhanced items, geneous if similar number types, number of digits, or test construction with multiple-choice (MC) items remains operations are used among answer options. Table 1 shows popular due to short administration times, scoring objectivity, two examples of math items with plausible but not homo- and the ability to make sound decisions with test results geneous answer choices. Choice D of Item 1 includes (Haladyna, Downing, & Rodriguez, 2002; McCoubrie, 2004). words, while the other answer choices contain only equa- However, creating an MC item can be challenging. Previous tions. Choice A of Item 2 is the only integer number, while studies have suggested that forming answer choices is a dif- the other choices are fractions. ficult part of the item-construction process (Haladyna & Writing items under the constraints of these three guidelines Downing, 1989; Hansen & Dexter, 1997), because each indi- requires more effort and time. However, the impact of homoge- vidual answer choice can potentially influence item quality. neity of answer choices on item difficulty (proportion of correct Haladyna and his colleagues (2002) proposed 31 item- writing guidelines for creating high-quality MC items that 1 Kahramanmaras Sutcu Imam University, Turkey contribute to overall test reliability and validity. The authors University of Kansas, Lawrence, USA examined three commonly cited guidelines related to con- Corresponding Author: structing answer choices: No.23 “Keep choices homoge- Erkan Hasan Atalmis, Kahramanmaras Sutcu Imam University, Avsar neous,” No.29 “Make all distracters plausible,” and No.30 Campus, Kahramanmaras 46100, Turkey. “Use typical errors of students.” Email: eatalmis@ksu.edu.tr Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open Table 1. Items with Plausible Answer Choices That Are Not Homogeneous. Item 1 Item 2 1 2 3 4 Which is true between and ? Which is × ? 2 3 2 3 1 2 A. A. 2 2 3 B. 1 2 B. < 2 3 C. 1 2 C. = 5 2 3 D. 1 2 D. and cannot be compared 2 3 •• Common errors of students •• Common errors of students •• Plausible distractors •• Plausible distractors •• Answer choices are not homogeneous •• Answer choices are not homogeneous (Choice D is not parallel to other answer choices) (Choice A is not parallel to other answer choices) answer) and discrimination (correlation with the total test score) Unlike previous studies that explored the impact of remains uncertain. The current study addresses this question. answer choice homogeneity on item difficulty for the items Haladyna and Rodriguez (2013) suggested that answer with word-based answer choices, the current study examines choices that are nonhomogeneous in content and grammar the impact of answer choice homogeneity on item difficulty can provide a cue to students about the correct answer. Cues and item discrimination of mathematics items with numeri- can make items easier and influence item discrimination cal answer choices. More specifically, we examine three indices because students take advantage of such cues to research questions: choose correct answers. Answer choice homogeneity is defined in different ways for Research Question 1: Are item difficulty and item dis- MC items based on written statements and MC items based on crimination for items with homogeneous answer choices mathematical expressions. For example, language similarity, statistically different from item difficulty for items with content, and grammatical consistency can make word-based nonhomogeneous answer choices? answer choices homogeneous. On the contrary, answer choices Research Question 2: Are item difficulty and item dis- in mathematics items can be made homogeneous by using a con- crimination for items with homogeneous answer choices sistent number of digits (three vs. two), type of number (integer statistically different from item difficulty for items with vs. fraction), and format of answer choices (words vs. numbers). nonhomogeneous answer choices when controlling item Although these differences may seem small, they may be a content (algebra vs. geometry)? source of construct-irrelevant variance, caused, perhaps, by test Research Question 3: Are item difficulty and item dis- wiseness or alternatively a student’s suspicion that the item crimination for items with homogeneous answer choices writer is trying to be tricky. This study examines the impact of statistically different from item difficulty for items with homogeneity of answer choices only for mathematics MC items. nonhomogeneous answer choices when controlling The few empirical test-development studies that have answer choice construction strategy? focused on answer choice similarity have had mixed results. Smith and Smith (1988) tested the impact of the similarity Method between correct choice (key) and incorrect student response for reading test items using different standard-setting methods, In this study, we replicated previous studies that compared such as Angoff and Nedelsky methods. They found that answer the difficulty and discrimination of items with homogeneous choice similarity makes items more difficult using both meth- answer choices and items with nonhomogeneous answer ods. However, they did not adequately define how to classify choices for mathematics items. We have also analyzed the answer choice similarity. Ascalon, Meyers, Davis, and Smits interaction of these factors with item content and answer (2007) found parallel results. They focused on only MC items choices construction strategy. on a driver’s license examination and classified answer choice similarity by comparing all distractors in an item with the cor- Test Items rect answer based on distractor content, such as theme, words, and sentence length. On the contrary, Green (1984) found the We selected 22 pairs of MC mathematics items from a state opposite result using general information items. test forms in the United States developed for Grades 3 to 8 Atalmis and Kingston 3 Table 2. Items with Homogeneous and Nonhomogeneous Answer Choices. Homogeneous Nonhomogeneous Item Pair 1 (algebra) What is 2,508 ÷ 4? What is 1,620 ÷ 9? A. 102 A. 68 B. 127 B. 102 C. 602 C. 120 D. 627 D. 180 Item Pair 2 (algebra) Solve for x: Solve for x: (x + 5) = 15 (x – 2) = 6 A. x = 7 A. x = 6 B. x = 6 B. x = 8 C. x = 9 C. x = 8 D. x = 10 D. x = 10 Item Pair 3 (geometry) Millie plans to paint a picture on a rectangular Tasha is covering a rectangular bulletin board piece of paper. She has a piece of paper with paper. The bulletin board is 14 feet (ft) that measures 13 inches (in) by 17 in. long and 4 ft wide. Exactly how much paper Exactly how many square inches (in ) of does Tasha need to completely cover the paper does Millie have? bulletin board? 2 2 A. 30 in A. 18 ft 2 2 B. 60 in B. 36 ft 2 2 C. 221 in C. 56 ft 2 2 D. 442 in D. 112 ft and high school. We selected the pairs by identifying items For Item Pair 2, both items measure the same specific that were based on the same specific learning standard, learning standard, which is “multiply each term inside the which is educational objectives students should possess at brackets (both algebraic term and number) by fraction out- critical point of any course, within algebra or geometry and side the brackets.” The answer choices of the item on the left that had either homogeneous or nonhomogeneous answer are homogeneous because two of the choices are integers and choices. The coding of homogeneity and nonhomogeneity the other two choices are mixed fractions; integers and mixed was determined by three judges who are researchers experi- fractions are equally represented in the answer choices. On enced at item-writing based on state standards and taxonomy. the contrary, the answer choices of the item on the right side Each MC item had four answer choices: the key and three are nonhomogeneous because Choice D is a whole integer, distractors. Although the items in each pair were written to while the other choices contain fractions. the same specific learning standards and the stems were set For Item Pair 3, both items measure the same specific up similarly, the item content was not identical as can be seen learning standard, which is “calculate the area of rectangle in the item pair examples in Table 2. in real word problems.” The answer choices of the item on For Item Pair 1, the items on the right and the left side the left are homogeneous because two of the choices con- measure the same specific learning standard, which is “divid- tain two digits, whereas the other two choices contain three ing a four-digit numbers by two-digit number.” The answer digits. The answer choices have two equal sets of similar choices of the item on the left are homogeneous because all choices in terms of number of digits. On the contrary, the choices have three-digit numbers. Moreover, the final digits answer choices of the item on the right side are nonhomo- in the choices are either 02 or 27. This also allows readers geneous because Choice D contains three digits, while the make a set of two similar choices: (102, 127) and (602, 627). other choices contain two digits. It does not harm dissimilarity among answer choices because After we selected the item pairs, three judges determined only one choice which is exactly different from others is not whether answer choices in item pairs were constructed using selected by take takers during the exam to provide cue. On “the same” or “different” strategies. Considering item pairs the contrary, the answer choices of the item on the right are in Table 1, Item Pair 3 was coded as “using the same strat- nonhomogeneous because Choice A contains a different egy” because each answer choice of items in Item Pair 3 was number of digits than the other answer choices. It means that constructed using the same strategy. It means that Choice A Choice A is exactly different from other in a particular way stems from adding length and width, Choice B from the for- and student might tend to select this option as cue. This influ- mula for perimeter, Choice C from area, and Choice D from ences psychometric properties of items. area times 2. 4 SAGE Open However, Item Pair 1 was coded as “using different strat- selected from those participating in a state accountability egy” because not all answer choices of items in Item Pair 1 assessment. Field test items were embedded in operational were constructed using the same strategy. For example, test forms and thus students were not aware that these items Choice B of homogeneous item (the left side) and Choice A were field test items and did not count toward their scores. of nonhomogeneous item (the right side) stem from dividing To check the similarity of the random samples of students the last three digits of dividend by divisor while Choice C of receiving each item within a pair, we took all of the demo- homogeneous item and Choice B of nonhomogeneous item graphic characteristics of the samples into account. For every stem from the combination of dividing the first two digits, one of the 22 item pairs, the samples of students differed by the third digit, and the last digit of dividend by divisor. no more than .01 in the proportion of disability of students, However, Choice A of homogeneous item stems from divid- .02 in the proportion of male students, .01 in the proportion ing each digit of dividend by divisor while Choice C of non- of students from any racial group, and .01 in the proportion homogeneous item stem from the combination of dividing of students qualified for a free or reduced-price lunch. In the first two digits of dividend by divisor and last two digits addition, the difficulty of each test form was the same of dividend. because mean total scores of the test forms were not more Item Pair 2 also was coded as “using different strategy” than .8, whereas standard deviation in the mean total scores because not all answer choices of items in Item Pair 2 were was not more than .06. Table 3 shows item characteristics of constructed using the same strategy. The items are related to each pair. the distributive property of multiplication over addition/sub- traction. Only Choice C of homogeneous item and nonhomo- Meta-Analysis geneous item was constructed using the same strategy, which stems from multiplying the first term inside the brackets by Meta-analysis is a statistical analysis integrating and sum- fraction outside the brackets and adding/subtracting the sec- marizing the results of individual quantitative studies on a ond term inside the brackets and/from the fraction. For other particular topic (Glass, 1976). Meta-analysis allows research- answer choices of homogeneous and nonhomogeneous items, ers to compute effect sizes to combine mean and standard they were constructed using different strategy. For homoge- deviation values, p values, or correlation coefficients pro- neous item, Choice A stems from making sign error after mul- posed by studies (Kulik & Kulik, 1989; Sen & Akbas, 2016). tiplying each term inside the brackets by the function. Choice Moreover, moderator analysis can be integrated into meta- B of homogeneous item stems from multiplying the number analysis for more precise estimate after studies are grouped on right side of equation by the function rather than multiply- based on moderator variables, such as age, gender, and sub- ing each term inside the brackets by the function. For nonho- ject area (Alpaslan, Yalvac, & Willson, 2017). mogeneous item, Choice B stems from multiplying each term There are two approaches to compute effect size in meta- inside the brackets. Choice D stems from equating the terms analysis: fixed-effects model and random-effects model. inside brackets to the number on right side of equation. Under the fixed model, the same effect size is calculated for After answer choices construction strategy of all item pairs all studies and weighted based on the number of observations was examined, we calculated the item difficulty and discrimi- by that study (Borenstein, Hedges, Higgins, & Rothstein, nation for each item within the pairs by using classical test 2012). Under random-effects model, effect size varies from theory (CTT) approach. Item difficulty is calculated as the study to study due to demographic characteristics of sam- proportion of examinees answering the item correctly while ples, such as differences in education level, age, and socio- item discrimination refers to the ability of an item to discrimi- economic status (Cooper, 2010). nate between students with high scores and low scores (Thorndike, 2005). We applied item-total correlation index to Item difficulty. Each item pair comparison was treated as a calculate item discrimination for each item in this study, separate study in a random-effects meta-analysis. Thus, the which is one of most widely used method (Downing, 2005). difference in sample characteristics and item content across Figure 1 shows item difficulty and item discrimination item pairs was accounted for by the random-effects error index values for each item pairs. A total of seven item pairs term. The software package Comprehensive Meta-Analysis had geometry content, whereas 14 item pairs had algebra was used to perform the analyses (Borenstein, Hedges, Hig- content. In terms of answer options construction strategy, gins, & Rothstein, 2005). Table 4 presents the results of five item pairs were coded as “using the same strategy” while fixed- and random-effects models, which are based on differ- 17 item pairs were coded as “using the different strategy” ent assumptions. True effect was the same across all item (see Figure 1). pairs in the fixed-effects model, whereas it varied from one item pair to another in the random-effects model (Borenstein et al., 2012). Participants The variation between item pairs, heterogeneity, was statis- The items in each pair were administered to randomly equiv- tically significant, Q(21) = 3,420.99, p < .001. The correspond- alent, overlapping, and nonoverlapping samples of students ing I was 99.39, which means that 99% of observed variance Atalmis and Kingston 5 Figure 1. Item difficulty and discrimination for 22 item pairs. in item pair effect sizes came from differences between item According to Table 5, algebra items with homogeneous pairs that were not explained by sampling variability. The point answer choices were easier than algebra items with nonho- estimates of the fixed-effects model and random-effects model mogeneous answer choices, with an average effect size of in Table 4 show the difference in item difficulty between the .17, which is statistically significant (z = 3.75, p < .001). items with homogeneous answer choices and the items with However, the difficulty of geometry items with homoge- nonhomogeneous answer choices. Items with homogeneous neous answer choices was not statistically different from the answer choices were easier than items with nonhomogeneous difficulty of geometry items with nonhomogeneous answer answer choices with average effect sizes of .12 in fixed-effects choices (z = –0.31, p = .756). model and .11 in random-effects model. The 95% confidence For items whose answer choices were constructed using interval ranged from .11 to .13 in fixed-effects model and .02 to different strategy in Table 5, items with homogeneous .20 in random-effects model. answer choices were easier than items with nonhomoge- We also conducted a mixed effects moderator analysis for neous answer choices, with an average effect size of .12, the content area of the items (algebra vs. geometry) and which is statistically significant (z = 2.46, p < .05). However, answer choice construction strategy, as shown in Table 5. for items whose answer choices constructed using the same 6 SAGE Open Table 3. Item Characteristics of Each Pair. Item discrimination. After we transformed the item discrim- ination indices for each of the 44 items from item-total Item characteristics correlations to Fisher z values, we generated fixed- and random-effects models for item discrimination values by # of Item Item # of pair Item type sample difficulty discrimination using the software package Comprehensive Meta-Analysis (Borenstein et al., 2005). Table 6 shows the results of the 1 Homogeneous 5,282 0.69 0.36 fixed- and random-effects models. Nonhomogeneous 5,266 0.62 0.32 The value of I was 98.97, which means that 99% of the 2 Homogeneous 5,274 0.72 0.39 observed variance in item pair effect sizes came from differ- Nonhomogeneous 5,266 0.62 0.32 ences between item pairs that were not explained by sampling 3 Homogeneous 3,233 0.53 0.41 variability. The variation between item pairs, heterogeneity, Nonhomogeneous 3,205 0.42 0.41 was statistically significant, Q(21) = 2,033.11, p < .001. The 4 Homogeneous 3,216 0.29 0.24 point estimate of the fixed-effects model and random-effects Nonhomogeneous 3,217 0.24 0.25 model showed that the difference in item discrimination 5 Homogeneous 3,222 0.38 0.23 between the items with homogeneous answer choices and the Nonhomogeneous 3,195 0.42 0.37 items with nonhomogeneous answer choices was not statisti- 6 Homogeneous 1,726 0.47 0.33 cally different—fixed-effects model: t(21) = –1.17, p = .22; Nonhomogeneous 1,705 0.39 0.3 random-effects model: t(21) = –0.08, p = .93. 7 Homogeneous 6,309 0.72 0.41 Nonhomogeneous 6,262 0.63 0.43 8 Homogeneous 6,263 0.63 0.34 Results and Discussion Nonhomogeneous 6,263 0.53 0.41 The purpose of this study was to explore the impact of the 9 Homogeneous 6,532 0.82 0.47 precise homogeneity of answer choices on item difficulty and Nonhomogeneous 6,532 0.84 0.4 discrimination of mathematics items. The findings showed 10 Homogeneous 6,360 0.94 0.37 that, overall, items with homogeneous answer choices were Nonhomogeneous 6,360 0.92 0.4 easier than items with nonhomogeneous answer choices, but 11 Homogeneous 1,726 0.83 0.39 the result depended on item content (algebra vs. geometry) Nonhomogeneous 1,726 0.91 0.44 and answer choice construction strategy. Algebra items with 12 Homogeneous 12,153 0.7 0.54 homogeneous answer choices were easier than algebra items Nonhomogeneous 12,153 0.68 0.57 with nonhomogeneous answer choices. However, the differ- 13 Homogeneous 13,283 0.84 0.46 ence in difficulty of geometry items with homogeneous answer Nonhomogeneous 13,283 0.75 0.47 choices was not statistically significant than from the diffi- 14 Homogeneous 9,563 0.8 0.5 culty of geometry items with nonhomogeneous answer Nonhomogeneous 9,563 0.81 0.55 choices. Taking into consideration answer choice construction 15 Homogeneous 9,328 0.76 0.58 strategy, the findings showed that items with homogeneous Nonhomogeneous 9,328 0.64 0.54 answer choices were easier than items with nonhomogeneous 16 Homogeneous 9,328 0.98 0.16 answer choices when different strategy was used to construct Nonhomogeneous 9,328 0.97 0.15 answer choices of items with homogeneous and nonhomoge- 17 Homogeneous 10,553 0.68 0.51 neous answer choices. On the contrary, when the answer Nonhomogeneous 10,553 0.8 0.31 choices of items were constructed using the same construction 18 Homogeneous 9,853 0.67 0.36 strategy, the difficulty of items with homogeneous answer Nonhomogeneous 9,853 0.76 0.37 choices and nonhomogeneous answer choices was not statisti- 19 Homogeneous 10,802 0.91 0.43 cally different. Moreover, the very large I statistic indicates Nonhomogeneous 10,802 0.81 0.37 that even when considering algebra versus geometry items, the 20 Homogeneous 8,336 0.77 0.27 source of most of the variation in difficulty across homogene- Nonhomogeneous 8,336 0.48 0.37 ity conditions was undetermined. Also the impact of homoge- 21 Homogeneous 17,971 0.73 0.51 neity of answer choices on discrimination was not statistically Nonhomogeneous 17,971 0.67 0.6 significant. 22 Homogeneous 10,782 0.83 0.4 The results of this study provide empirical support to Nonhomogeneous 10,782 0.69 0.61 the growing body of test-development studies. Specifically, this study contributes to research on the impact of answer choice homogeneity on item psychometric characteristics. strategy, the difficulty of items with homogeneous answer First, unlike past studies that focused on word-based choices was not statistically different from the difficulty of items, the current study examined the impact of homoge- items with nonhomogeneous answer choices (z = 0.65, neity of answer choices on psychometric characteristics of p = .519). Atalmis and Kingston 7 Table 4. Fixed Effect and Random Effect of the Model for Item Difficulty. Effect size and 95% confidence interval Test of null Heterogeneity Model # of studies M Lower limit Upper limit z value p value Q df (Q) Significance level I Fixed 22 0.12 0.11 0.13 35.33 .00 3,420.99 21 .00 99.39 Random 22 0.11 0.02 0.20 2.51 .01 Table 5. Moderator Analysis for Content Area and Answer Choice Construction Strategy (Mixed-Effect Analysis). Effect size and 95% confidence interval n M Lower limit Upper limit z Significance level Content area Algebra 15 0.17 0.08 0.26 3.75 .000 Geometry 7 −0.02 −0.14 0.11 −0.31 .756 Total 22 0.11 0.03 0.18 2.86 .004 Answer choice construction strategy Different strategy 17 0.12 0.03 0.22 2.46 .014 The same strategy 5 0.06 −0.13 0.26 0.65 .519 Total 22 0.11 0.02 0.2 2.48 .013 Table 6. Fixed Effect and Random Effect of the Model for Item Discrimination. Effect size and 95% confidence interval Test of null Heterogeneity Model # of studies M Lower limit Upper limit z value p value Q value df (Q) Significance level I Fixed 22 0.00 −0.01 0.00 −1.17 .22 2,033.11 21.00 .00 98.97 Random 22 0.00 −0.07 0.06 −0.08 .93 mathematics items. With regard to effect in item diffi- conducted a meta-analysis to determine the impact of on culty, our overall findings are consistent with Green item difficulty and discrimination by examining 56 empir- (1984), yet inconsistent with Smith and Smith (1988) and ical studies, most of which contained social sciences and Ascalon et al. (2007), in that on average the items with language arts items which are word-based items. The find- homogeneous answer choices were easier. However, when ings showed that items become easier when number of answer choices were constructed using the same strategy, options decreased. However, a recent study conducted by it was found that the difficulty of items with homogeneous Atalmış and Kingston (2017) designed MC mathematics answer choices was not statistically different from the dif- items with four options and three options constructed ficulty of items with nonhomogeneous answer choices, using the same strategy and found that item difficulty was which is not consistent with the findings of Green (1984), not statistically different across the items with four options Smith and Smith (1988), and Ascalon et al. (2007). One and three options. These findings supported only if options hypothesis that might explain these results is that the of mathematics items were constructed using the same impact of option homogeneity depends on item content, strategy, item difficulty may not be changed by homoge- because the discrepant results came from studies using neity of answer choices or number of the options. reading test items, driver license items, and general infor- Another contribution of this study is that the current study mation items. The common characteristic of such word- examined the impact of answer choice homogeneity not only based items for examinees is that the probability of on item difficulty but also on item discrimination which dif- choosing correct answer using answer choices may be far fers from previous studies. The findings showed that item higher compared with mathematics items because exam- discrimination was not statistically influenced by mathemat- inees may choose correct answer for a mathematics item ics items with homogeneous answer choices and nonhomo- only if they know the solution regardless of answer geneous answer choices even if different item contents and choices. Existing literature related to number of options answer choice construction strategies were applied. A on item psychometric properties may be considered to hypothesis that might explain these results is that items with support this claim. For example, Rodriguez (2005) homogeneous answer choices and nonhomogeneous answer 8 SAGE Open were constructed applying the same specific learning stan- References dards and the stems. Thus, when parallel mathematics item Alpaslan, M. M., Yalvac, B., & Willson, V. (2017). A meta analyti- pairs are constructed, learning standards and stem similarity cal review of the relationship between personal epistemology may be considered as higher priority than item content and and self-regulated learning. Turkish Journal of Education, 6, answer choose construction strategy to keep item discrimina- 48-67. Ascalon, M. E., Meyers, L. S., Davis, B. W., & Smits, N. (2007). tion similarity between parallel items. Distractor similarity and item-stem structure: Effects on item The last contribution of this study is the use of meta-anal- difficulty. Applied Measurement in Education, 20, 153-170. ysis in unconventional contexts. Although meta-analysis is doi:10.1080/08957340701301272 usually used for integrating and summarizing the results of Atalmış, E. H., & Kingston, N. M. (2017). Three, four, and none of individual quantitative studies on a particular topic, in this the above options in multiple-choice items. Turkish Journal of study, each item pair comparison was considered as a sepa- Education, 6, 142-157. doi:10.19128/turje.333687 rate study to examine the impact of answer option homoge- Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, neity on item psychometric properties. H. (2005). Comprehensive meta-analysis (Version 2.2.048) The results of this empirical study also apply to item [Software]. Englewood, NJ: Biostat. writers and classroom teachers. Writing MC items with Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. plausible distractors based on common student errors is (2012). Introduction to meta-analysis. Chichester, UK: John Wiley. always suggested (Haladyna et al., 2002). However, creat- Cooper, H. (2010). Research synthesis and meta-analysis. Thousand ing plausible distractors may affect answer choice homoge- Oaks, CA: SAGE. neity. Although each approach has merits, creating items Downing, S. M. (2005). The effects of violating standard item writ- with both plausible and homogeneous answer choices poses ing principles on tests and students: The consequences of using a challenge for item writers and classroom teachers. That is, flawed test items on achievement examinations in medical item writers might need to spend significant extra time to education. Advances in Health Sciences Education, 10(2), 133- construct fourth or fifth option which is plausible and 143. doi:10.1007/s10459-004-4019-5 homogeneous, and thus construct fewer items in a given Glass, G. V. (1976). Primary, secondary and meta-analysis of amount of time. The results of the current study are infor- research. Educational Researcher, 5(10), 3-8. mative for test creators on the usage of such approaches Green, K. (1984). Effects of item characteristics on multiple-choice while designing a test. item difficulty. Educational and Psychological Measurement, 44, 551-561. doi:10.1177/0013164484443002 Given the results of this study combines with the existing Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of literature on item development, we recommend that the use multiple-choice item-writing rules. Applied Measurement in of plausible answer options be given a higher priority than Education, 2, 37-50. doi:10.1207/s15324818ame0201_3 the use of homogeneous answer options in constructing MC Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A mathematics items, but that both guidelines be considered. review of multiple-choice item-writing guidelines for class- There are some limitations of this study. First, there were room assessment. Applied Measurement in Education, 15, 309- only 22 pairs of items; thus, the number of items per grade 334. doi:10.1207/S15324818AME1503_5 level and the variety of types of nonhomogeneity were lim- Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and vali- ited. Second, this study was conducted for only mathematics dating test items. New York, NY: Routledge. items. Most importantly, most of the variation in difficulty Hansen, J. D., & Dexter, L. (1997). Quality multiple-choice test differences between the items with homogeneous and non- questions: Item-writing guidelines and an analysis of audit- ing test banks. The Journal of Education for Business, 73, homogeneous answer choices was not explained. We recom- 94-97. mend additional exploration of item content and answer Kulik, J. A., & Kulik, C. L. C. (1989). The concept of meta-analysis. options construction strategy as moderator variables. International Journal of Educational Research, 13, 227-340. McCoubrie, P. (2004). Improving the fairness of multiple-choice Authors’ Note questions: A literature review. Medical Teacher, 26, 709-712. An earlier version of this article was presented at the National doi:10.1080/01421590400013495 Council on Measurement in Education (NCME) in Philadelphia, Rodriguez, M. C. (2005). Three options are optimal for multi- PA, USA, in 2014. ple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13. Declaration of Conflicting Interests doi:10.1111/j.1745-3992.2005.00006.x Sen, S., & Akbas, N. (2016). A study on multilevel meta-analysis The author(s) declared no potential conflicts of interest with respect methods. Journal of Measurement and Evaluation in Education to the research, authorship, and/or publication of this article. and Psychology, 7(1), 1-17. Smith, R. L., & Smith, J. K. (1988). Differential use of item Funding information by judges using Angoff and Nedelsky proce- The author(s) received no financial support for the research, author- dures. Journal of Educational Measurement, 25, 259-274. ship, and/or publication of this article. doi:10.1111/j.1745-3984.1988.tb00307.x Atalmis and Kingston 9 Thorndike, R. M. (2005). Measurement and evaluation in psychol- Neal Kingston, PhD, came to the University of Kansas in 2006 ogy and education (7th ed.). Upper Saddle River, NJ: Pearson. and is a professor in the Research, Evaluation, Measurement, and Statistics Program and director of the Achievement and Assessment Institute. His research focuses on large-scale assess- Author Biographies ment, with particular emphasis on how it can better support stu- Erkan Hasan Atalmis is an assistant professor at the dent learning. He is the principal investigator/director or co-prin- Kahramanmaras Sutcu Imam University in Turkey. He earned cipal investigator of several large research projects, including PhD degree in Research, Evaluation, Measurement, and Statistics Design and Development of a Dynamic Learning Maps Alternate Program from University of Kansas. His research interest includes Assessment, Kansas Assessment Program, Career Pathways item and test development, type of grading system in educational Assessment System, and Development and Validation of Online assessment, and supplementary education. Adaptive Reading Motivation Measures.

Journal

SAGE OpenSAGE

Published: Feb 14, 2018

Keywords: multiple-choice item; item-writing guidelines; homogeneity of answer choices; test validity; meta-analysis

References