An Explanation of Statistical Tools from DocumentingExcellence.com
A consulting practice focusing on working with colleges', organizations', and individuals' utilization of quantitative and qualitative assessment tools to analyze and document their quality outcomes through providing staff development, research design and analysis, and psychometric evaluations.
x

Reliability

Reliability is a measure of consistency.  Assessment of reliability focus on both characteristics of specific items and characteristics of an instrument (scale, or other construct).

Measures based on Raters and Evaluators

bullet Inter-rater reliability is an application of split-half or other non-parametric procedures.
bullet Rating rubrics provide reference criteria for systematic comparison of open-responses (essays, written responses, visual or auditory performances, etc.).  Rubrics can increase the reliability in evaluation of performance.  That reliability can be evaluated and compared among teams, or with reference to specific items or evaluators.

At the item level using Item Response Theory (IRT): IRT is a set of statistical procedures that produce both measurements of individuals' performances and evaluations of item characteristics.  These procedures are primarily used in standardized testing of knowledge-concepts (and are not very helpful for personality and behavior instruments).  IRT requires an assumption that the set of items being examined are coherent as evaluated in CTT, see below)

bullet Item discrimination provides an evaluation of how effectively an item identifies, or separates between those who meet a criteria and those who do not meet a criteria.
bullet Item difficulty provides an evaluation of how hard an item is.
bullet Pseudo-guessing estimate provides an evaluation of the probability that a respondents can guess a correct answer without knowing the intended concept.

At the scale or construct level using classic testing theory (CTT) item analysis:

bullet Factor analysis using principal component extraction from a pool of items - used to identify items that may have scale coherence
bullet Correlation of an item to a scale or other construction.
bullet Cronbach's alpha α to evaluate scale consistency.  Conceptually this statistic is a mean average of all split-half permutations.
bullet Split-half reliability is a process to evaluate item consistency.  (Split-half reliability could be "hand" calculated and was frequently used prior to computer software that made more powerful alternatives available.)

At the instrument level:

bullet Issues of consistency in test-retest reliability focus on tools producing similar scores when administered at different times.
bullet Issues of agreement among alternative forms or parallel-forms reliability.  Many of the issues arising in construction multiple forms can be addressed by application of IRT procedures.
bullet Issues of agreement among human evaluators/raters or rubrics -- inter-rater or inter-observer reliability.

When considering reliability some statistics may help users and developers assess a tool for its effectiveness.  A developer may want to share these types of statistics with users.  Other measures of specific item and tool reliability may be of interest primarily to developers, and therefore not shared with users.  Low levels of reliability on scales may be an indication of challenges from low levels of construct validity.

The following pages have more information

bullet Statistical processes for tool assessment by a developer
bullet Statistical summary of a tool assessment for a user


Send mail to with questions or comments about this web site.
Copyright © 2010 by Peter T. Klassen, Ph.D. Principal, www.DocumentingExcellence.com
9 January, 2010