An Explanation of Statistical Tools from DocumentingExcellence.com
A consulting practice focusing on working with colleges', organizations', and individuals' utilization of quantitative and qualitative assessment tools to analyze and document their quality outcomes through providing staff development, research design and analysis, and psychometric evaluations.
x

Item Construction

Construction of multiple choice items in cognitive testing focuses on several questions.

bullet Item difficulty - How difficult is an item?  In evaluative tools one uses a range of difficulty from simple master questions through to tough challenges in order to measure this same range of skills among respondents.  The difficulty of an item is the number of correct answers divided by the total number of responses.  Thus, .5 means that one-half of those answering the question got it correct.  This calculation is easily completed using a spreadsheet.
bullet Item discrimination - One common problem arises when a item is answered correctly by someone with minimum knowledge but incorrectly by a more knowledgeable respondent.  The basic issues is "Does an item serve to tell the difference between those with the skills and those without them?"  There are different approaches to measuring item discrimination, but all work around the idea of comparing what percent of the top scoring people got the item correct compared with the percent of low scoring people getting the item correct. 
In simplest terms, take the top performing set of tests (based on total scores) and count the number of people getting an item correct, the take the same number of lower performing tests and count the number of people getting the item correct.  The top set of tests should have  more people with correct answers than the bottom.  Comparing this difference for each item can show which items helped identify top performance, and which ones did not add to that measurement.  Easy items, of course, will tend to have lower levels of discrimination.  There are various ways to statistically manipulate this difference in performance to scale the discrimination.  Some testing software includes this measure as output.
bullet Distracter Analysis - Examines the incorrect choices made in a multiple choice item.  Count the number of incorrect responses for each choice.  How many people chose A incorrectly, B incorrectly, etc. 
bullet Next, examine the most popular incorrect choice.  Does it tell us something about why the students is making a mistake. 
bullet Or, one may design a choice to identify a specific mistake in thinking.  For example, when I test students on calculating "standard deviation for a sample" I include answers that would be found when the use a N rather than N-1, and when they forget to take a square root.  If you don't know simple statistics then this example may not work, but understand that there are probably predictable wrong answers.  Using them can help teachers help learners to understand their mistakes and correct them.
bullet If NO ONE chooses an incorrect distract or choice, then why include it, because it is not helping to differentiate between those who know and don't know the focus of the question.
bullet Construct Validity - Does an item measure the skill or knowledge intended?   Do the information and skills required to successfully answer the question match the outcomes and goals expected?  Response to this question may be based on authorities' review by the test's author and other experts.
bullet  Item reliability -  Is a broader question since it is based on how one item relates to other items that are expected to be measuring a common learning outcome.  One way to analyze this is to examine the correlation of the item to a score (sum) of the other items with a common learning outcome.  Low correlations mean the item is not a coherent part of a set of similar questions.


Send mail to with questions or comments about this web site.
Copyright © 2013 by Peter T. Klassen, Ph.D. Principal, www.DocumentingExcellence.com
26 December, 2012