An Explanation of Statistical Tools from DocumentingExcellence.com
A consulting practice focusing on working with colleges', organizations', and individuals' utilization of quantitative and qualitative assessment tools to analyze and document their quality outcomes through providing staff development, research design and analysis, and psychometric evaluations.
x

Glossary

classification refers to the process of assigning cases that share some common identification but do not have social contact, interaction, or awareness of their shared aggregation to categories.

confidence level refers to the probaility criteria used to select a standard value from a Normal distribution Z table. It is the level of probabilty set by a researcher. It is thus a criteria that influences hypothesis testing. It is NOT determined by data, but rather a researcher's specified standard. Commonly 95% is selected as a confidence level by many researchers. Sometimes the 90% value is selected when the resurch has small numbers of cases. When a research topic requires a higher level of confidence a 99% level may be selected. These probabilities are drawn from the area under a normal distribution Z-table.

confidence interval (CI) is an interval-estimate of a population parameter that indicates a range of values that the parameter might have based on different samples.

control (or statistical control) refers to procedures used with real-world data when the condition of "all other things being equal" cannot be attained by experimental design.

correlation refers to a statistical measure of relationship.  This statistic, sometimes referred to as rho, can vary from -1 through 0 to +1.  Positive numbers indicate that two variable or measures tend to increase of decrease together, i.e. as one increases so does the other.  Negative numbers indicate that two measures tend to move in opposite directions, i.e. as one increases the other one decreases.  Correlations near 0 indicate that two measures are not related.  Evaluating when a correlation is significant (sufficiently distant from 0) requires specific calculations related to each analysis.

critical value is a critera identified from a distribution table. Different distributions may be used depending on the statistical procedure.

discriminate refers to a process of classifying or determining that there is an identifiable difference.  It is an important statistical concept, not to be confused with the social process associated with prejudice and unjust treatment.

descriptive statistics refers to statistics use to summarize characteristics or outcomes from a data set.

experiment refers to the design of research that meets a critical criteria.  That criteria requires two or more groups of subjects being assigned to a controlled setting, condition, or experience in which the researcher manipulates the critical conditions and all other conditions are constant.  Most frequently this is accomplished by using three conditions.

group refers to two or more people who share some common identification and have social contact and interaction.

inferential statistics refers to statistics used to make conclusions or generalizations about populations based on samples including the shared characteristics, similarities, differences, and patterns of responses.

level of confidence see confidence level.

mean average refers to a value that is at the middle of a set of values.  One-half of the value is higher and lower than the mean in terms of amount.

median average refers to a specific score or cases that is at the middle of a set of cases or scores.  One-half of the number of cases or data points are higher and one-half lower in terms of the number of measures.

mode refers to the most frequently occurring measurement.  This form of average is used most frequently with categories, and data sets with a limited range of possible values.

normal distribution The phrase "a bell shaped curve" is usually refereing to this distribution. The area between the mean (center) and 1 standard deviation (SD) containes 34.1% of the data points. Thus, +- 1 SD contains 68.2% of the data points. A Z-table provides this distribution as decimal proportions.

standard distribution
[Graphic from Wikipedia - 10 Jan 2012]

population the total of all cases meeting a definition of cases sharing that criteria.  A population is usually too large to be measured except by census, and therefore sampling is used to collect representative data from which inferences can be drawn.

qualitiative refers to observations, measures, research processes that focus on characteristics that cannot be counted, they are descriptive classificaitons. For example male/female, preferences for type of food, etc.

quantitative refers to observations, measures, research process that focus on measurement of quantity, where each value represents some quantity or change in quantity.

reliability refers to the idea that a tool is consistent in its report.  This may be the issues in looking if a tool administered at different times produces similar scores.  It may also be an issue when two or more evaluators look at the same evidence as to whether the scores of each evaluator are consistent with the others.  It may also be an issue as to whether each of several items used to construct a scale are loading to that scale in a consistent manner.

sample or representative sample a set of cases drawn from the larger population.  In order to be representative, a sample needs to drawn using pre-defined criteria.  A random sample is one in which each case in the population has the same probability of being included in the sample each time a new cases is selected.   A stratified sample is one in which random sampling is conducted within classifications of cases with a know proportion of cases in the population.  Stratified sampling is used to gather representative cases with enough coverage of low proportion cases.  When analysis is conducted the proportions for each stratification category are used to adjust the sample results for the over sampling.  A convince sample is one in which available cases are used without controls of their representativeness.  For example, using one's friends and acquaintances, or everyone one meets at a bar, as a sample of a town's population.

standard deviation is a statistical concept used in many statistical processes.  Standard deviation is based on the variance in a set of data.  In simplest terms, one standard deviation is the square-root of the average of the difference between each data point and the average of those same points.  s.d. = √(∑(Avg.- x1..n)/N-1) where N is the number of cases.  There are several other formulas for calculating standard deviation.  The use of "N-1" is used to estimate the standard deviation for a population when the data comes from a sample.  When data is for a complete population the formula uses "N."  As one may see, as samples become larger the effect of "N-1" converge with "N."

standard error is the estimated standard deviation of a parameter based on a sample.

statistically significant is a judgment that a statistical procedure or test leads one to conclude that a similarity, difference, or relationship attains sufficient credibility and support.  Statistical significance means that these relationships are greater than the margin of error.  Since the size of differences and strength of relationships required to make this judgment are based in part on the number of cases being analyzed, statistical significance is not the same as evaluation of the size or strength of a relationship.  That is, while many relationships are both statistically significant and of substantial importance, some may be significant but not worth talking about.

validity refers to the idea that a measurement tool's output represent a report that is a solid indicator of whatever characteristics the designer has targeted.  Validity may be an issue of whether a tool accurately represents a theoretical construct.  It may also focus on to what extent results on one measurement can be used to predict a second outcome.

variance is a characteristic of a set of of data, or a series of answers.  From a statistical perspective it is a measures of how cases are distributed within a range, of how much responses differ.  If every case has the same value on some measure, then the variance is 0, otherwise there will be some level of variance. Variance is the sum of the squared differences between each score and the mean average of all scores.

variable is a defined condtion, classification, or other concept that is specified by a researcher that can be measured or identified. As such, it can take on at least two and possible more values (present - not present, throught to counting and other measures of a value).

When a reseracher considers two or more variables, one variable is frequently defined as the outcome or resulting condition. In this case the proposed hypothesis is that this variable is influenced by the other. The variable that is affected, that is "responding" is labled the dependent variable. That is the dependent variable depends on the effects of the other (independent) variables. These other variables are then refered to as independent variables. The assignment of variables are independent and dependent is made by the researcher. These assignments are frequently important in manipulating resesarch statistics.



Send mail to with questions or comments about this web site.
Copyright © 2013 by Peter T. Klassen, Ph.D. Principal, www.DocumentingExcellence.com
10 January, 2013