Construction of multiple choice items in cognitive testing focuses on several questions.
|Item difficulty - How difficult is an item? In evaluative tools one uses a range of difficulty from simple master questions through to tough challenges in order to measure this same range of skills among respondents. The difficulty of an item is the number of correct answers divided by the total number of responses. Thus, .5 means that one-half of those answering the question got it correct. This calculation is easily completed using a spreadsheet.|
Item discrimination - One common problem arises
when a item is answered correctly by someone with minimum knowledge but
incorrectly by a more knowledgeable respondent. The basic issues is "Does an item serve to tell the difference
between those with the skills and those without them?" There are
different approaches to measuring item discrimination, but all work around the
idea of comparing what percent of the top scoring people got the item correct
compared with the percent of low scoring people getting the item correct.
In simplest terms, take the top performing set of tests (based on total scores) and count the number of people getting an item correct, the take the same number of lower performing tests and count the number of people getting the item correct. The top set of tests should have more people with correct answers than the bottom. Comparing this difference for each item can show which items helped identify top performance, and which ones did not add to that measurement. Easy items, of course, will tend to have lower levels of discrimination. There are various ways to statistically manipulate this difference in performance to scale the discrimination. Some testing software includes this measure as output.
Distracter Analysis - Examines the incorrect choices made in a
multiple choice item. Count the number of incorrect responses for each
choice. How many people chose A incorrectly, B incorrectly, etc.
|Construct Validity - Does an item measure the skill or knowledge intended? Do the information and skills required to successfully answer the question match the outcomes and goals expected? Response to this question may be based on authorities' review by the test's author and other experts.|
|Item reliability - Is a broader question since it is based on how one item relates to other items that are expected to be measuring a common learning outcome. One way to analyze this is to examine the correlation of the item to a score (sum) of the other items with a common learning outcome. Low correlations mean the item is not a coherent part of a set of similar questions.|