An Explanation of Statistical Tools from DocumentingExcellence.com
A consulting practice focusing on working with colleges', organizations', and individuals' utilization of quantitative and qualitative assessment tools to analyze and document their quality outcomes through providing staff development, research design and analysis, and psychometric evaluations.
x

Assessing / evaluating an Item

Once an item is developed and deployed into tests, one needs to consider if the item and test continue to serve their purpose. 

Assessment and review of items and tests should be a continual process.  These examinations should occur after each utilization of an instrument.  The examination should consider the following criterion.

bullet Questions of validity and reliability
bullet Has the skill/knowledge base changed?  Items should be reviewed for currency and accuracy as they relate to the subject being examined by experts or those knowledgeable in the field.
bullet Are the targeted skills/knowledge being adequately tested? 
bullet Are there voids or over-examined areas? 
bullet Is each area reliably examined?  This is a balancing issue.  If one uses only one or a few items per area, then the evaluation is subject to a high level of test-error.  However, using many items, while more reliable, may also make a test unmanageable.
bullet Are the item's difficulty and discrimination acceptable?  Changes in these measures may indicate either problems with an item, or compromises in the security of a test.  After some time items may become widely known and thus loose their value, but these changes should show up in lower difficulty and discrimination values.
bullet Are test results predictive of other independent measures of skill and knowledge?  Thus, is the test and its items satisfying its design?
bullet Questions of structure and purpose
bullet Is there a need to adjust the overall test's structural results to standardized it, make it more or less difficult?  If yes, then items may be changed to accomplish these goals.
bullet Is there a need to adjust the overall test for political or social reasons?  A test that validly and reliably tests subjects and produces an average of 50% does not tell us more than a test that results in an average of 80%.  However, in some  settings it may be appropriate to adjust a test to help with stakeholder (public) or student perceptions and attributions.  These changes do not mean that one is "dumbing down"  the test if one maintains the discrimination value of items.  However, when the range of subjects knowledge is broad tests may have lower averages in order to evaluate that wide range of knowledge and not "top out."
bullet Is the purpose of the test to evaluate relative strength among test takers, or is it to judge a pass/fail level of knowledge?  Item choice may be guided by a design that targets normal distribution, universal distribution, or bi-modal distribution of scores.


Send mail to with questions or comments about this web site.
Copyright © 2012 by Peter T. Klassen, Ph.D. Principal, www.DocumentingExcellence.com
2 March, 2012