Glossary
classification refers to the process of assigning cases that share some common identification but do not have social contact, interaction, or awareness of their shared aggregation to categories.
control (or statistical control) refers to procedures used with real-world data when the condition of "all other things being equal" cannot be attained by experimental design.
correlation refers to a statistical measure of relationship. This statistic, sometimes referred to as rho, can vary from -1 through 0 to +1. Positive numbers indicate that two variable or measures tend to increase of decrease together, i.e. as one increases so does the other. Negative numbers indicate that two measures tend to move in opposite directions, i.e. as one increases the other one decreases. Correlations near 0 indicate that two measures are not related. Evaluating when a correlation is significant (sufficiently distant from 0) requires specific calculations related to each analysis.
discriminate refers to a process of classifying or determining that there is an identifiable difference. It is an important statistical concept, not to be confused with the social process associated with prejudice and unjust treatment.
descriptive statistics refers to statistics use to summarize characteristics or outcomes from a data set.
experiment refers to the design of research that meets a critical criteria. That criteria requires two or more groups of subjects being assigned to a controlled setting, condition, or experience in which the researcher manipulates the critical conditions and all other conditions are constant. Most frequently this is accomplished by using three conditions.
group refers to two or more people who share some common identification and have social contact and interaction.
inferential statistics refers to statistics used to make conclusions or generalizations about populations based on samples including the shared characteristics, similarities, differences, and patterns of responses.
mean average refers to a value that is at the middle of a set of values. One-half of the value is higher and lower than the mean in terms of amount.
median average refers to a specific score or cases that is at the middle of a set of cases or scores. One-half of the number of cases or data points are higher and one-half lower in terms of the number of measures.
mode refers to the most frequently occurring measurement. This form of average is used most frequently with categories, and data sets with a limited range of possible values.
population the total of all cases meeting a definition of cases sharing that criteria. A population is usually too large to be measured except by census, and therefore sampling is used to collect representative data from which inferences can be drawn.
reliability refers to the idea that a tool is consistent in its report. This may be the issues in looking if a tool administered at different times produces similar scores. It may also be an issue when two or more evaluators look at the same evidence as to whether the scores of each evaluator are consistent with the others. It may also be an issue as to whether each of several items used to construct a scale are loading to that scale in a consistent manner.
sample or representative sample a set of cases drawn from the larger population. In order to be representative, a sample needs to drawn using pre-defined criteria. A random sample is one in which each case in the population has the same probability of being included in the sample each time a new cases is selected. A stratified sample is one in which random sampling is conducted within classifications of cases with a know proportion of cases in the population. Stratified sampling is used to gather representative cases with enough coverage of low proportion cases. When analysis is conducted the proportions for each stratification category are used to adjust the sample results for the over sampling. A convince sample is one in which available cases are used without controls of their representativeness. For example, using one's friends and acquaintances, or everyone one meets at a bar, as a sample of a town's population.
standard deviation is a statistical concept used in many statistical processes. Standard deviation is based on the variance in a set of data. In simplest terms, one standard deviation is the square-root of the average of the difference between each data point and the average of those same points. s.d. = √(∑(Avg.- x1..n)/N-1) where N is the number of cases. There are several other formulas for calculating standard deviation. The use of "N-1" is used to estimate the standard deviation for a population when the data comes from a sample. When data is for a complete population the formula uses "N." As one may see, as samples become larger the effect of "N-1" converge with "N."
standard error is the estimated standard deviation of a statistic based on a sample.
statistically significant is a judgment that a statistical procedure or test leads one to conclude that a similarity, difference, or relationship attains sufficient credibility and support. Statistical significance means that these relationships are greater than the margin of error. Since the size of differences and strength of relationships required to make this judgment are based in part on the number of cases being analyzed, statistical significance is not the same as evaluation of the size or strength of a relationship. That is, while many relationships are both statistically significant and of substantial importance, some may be significant but not worth talking about.
validity refers to the idea that a measurement tool's output represent a report that is a solid indicator of whatever characteristics the designer has targeted. Validity may be an issue of whether a tool accurately represents a theoretical construct. It may also focus on to what extent results on one measurement can be used to predict a second outcome.

