Finding a common distribution - Z-scores
Variable measurement uses units that are based on a raw scale. For example, distances can be measured in feet and inches, or in centimeters and meters. The use of these raw units of measurement in statistical analysis presents two challenges. One challenge is that of comparing and testing for differences when data set distributions are different. The other challenge is that of comparing and testing for relationships when the raw scale units of measurement are different. Each of these challenges and the statistical responses are discussed below.
First, two sets of data may have different distributions. One may have a wide range of values while the second is very narrowly distributed. These differences in distributions make it difficult to judge similarities or significant differences among categories in the data sets.
Second, two variables may have different units in the raw scale. In the distance example it is possible to convert from one scale to another. However, as we want to look at more complex relationships between variables the raw scale units may not be convertible. For example, the relationship between a person's height and weight are two different units of measurement.
In both of these challenges, conversion of data point from raw-units to standard deviation units allows for statistical testing. Z-scores are the data label for standard deviation units. Raw unit measures are converted to standard deviation units. Z-scores are sometimes referred to as "standardized" scores. Z-scores range from negative numbers through 0 to positive numbers.
|A z-score of 0 is the mean of the z-score distribution and it is equal to the mean of the raw-unit mean average.|
|Each data-point's z-score is that data point's raw score minus the mean average then divided by the the standard deviation|
z-score1 = (raw-score1 - MA) / SD
Where z-score1 is the value for each z-score
raw-score1 is the raw score value for each data point
MA is the mean average for the set of data points, and
SD is the standard deviation for the set of data points.
By converting to z-scores data sets are converted to a distribution that has know characteristics. This means that it is possible to calculate the relative probabilities of different scores and values for averages of scores. In so doing one can estimate the probability that two different categories of data-points come from the same population or from significantly different populations. It is this use of z-scores that allows one to use the statistical tables of areas of the Standard Normal Distributions to find the probability of higher and lower scores in a data set.
By converting to z-scores variables with different raw-units are converted to comparable distributions in which the averages are set as equal, and differences in distributions are adjusted to a common distribution -- the standard normal distribution.